#jupyter_notebook #big_data #bigdata #cortx_community #distributed_storage #distributed_systems #hackathons #hacktoberfest #hacktoberfest2020 #inclusivity #object_storage #object_storage_service #objectstorage #objectstore #open_source #opensource #s3 #s3_storage #software_defined_storage #storage #storage_api
https://github.com/Seagate/cortx
https://github.com/Seagate/cortx
GitHub
GitHub - Seagate/cortx: CORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacityโฆ
CORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices. - Seagate/cortx
#java #artificial_intelligence #bigdata #data #machine_learning #metadata
https://github.com/open-metadata/OpenMetadata
https://github.com/open-metadata/OpenMetadata
GitHub
GitHub - open-metadata/OpenMetadata: OpenMetadata is a unified metadata platform for data discovery, data observability, and dataโฆ
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team co...
#c_lang #bigdata #cloud_native #cluster #connected_vehicles #database #distributed #financial_analysis #industrial_iot #iot #metrics #monitoring #scalability #sql #tdengine #time_series #time_series_database #tsdb
TDengine is a powerful, open-source time-series database designed for handling large amounts of data from IoT devices, connected cars, and industrial IoT. Here are the key benefits It can handle billions of data collection points efficiently, outperforming other time-series databases in data ingestion, querying, and compression.
- **Simplified Solution** Designed for cloud environments, it supports distributed design, sharding, partitioning, and Kubernetes deployment.
- **Ease of Use** Makes data exploration and access efficient through features like super tables and pre-computation.
- **Open Source**: Available under open source licenses with an active developer community.
Using TDengine helps you manage and analyze large-scale time-series data efficiently, making it ideal for various IoT and industrial applications.
https://github.com/taosdata/TDengine
TDengine is a powerful, open-source time-series database designed for handling large amounts of data from IoT devices, connected cars, and industrial IoT. Here are the key benefits It can handle billions of data collection points efficiently, outperforming other time-series databases in data ingestion, querying, and compression.
- **Simplified Solution** Designed for cloud environments, it supports distributed design, sharding, partitioning, and Kubernetes deployment.
- **Ease of Use** Makes data exploration and access efficient through features like super tables and pre-computation.
- **Open Source**: Available under open source licenses with an active developer community.
Using TDengine helps you manage and analyze large-scale time-series data efficiently, making it ideal for various IoT and industrial applications.
https://github.com/taosdata/TDengine
GitHub
GitHub - taosdata/TDengine: High-performance, scalable time-series database designed for Industrial IoT (IIoT) scenarios
High-performance, scalable time-series database designed for Industrial IoT (IIoT) scenarios - taosdata/TDengine
#go #batch_systems #bigdata #gene #golang #hpc #kubernetes #machine_learning
Volcano is a powerful batch system built on Kubernetes, designed to manage complex workloads like machine learning, bioinformatics, and big data applications. It integrates with popular frameworks such as TensorFlow, Spark, and PyTorch. Volcano benefits users by providing efficient scheduling and management of high-performance workloads, leveraging over 15 years of experience and best practices from the open source community. It is widely used in various industries and has a strong community support with hundreds of contributors. Installing Volcano is straightforward, either through YAML files or Helm charts, making it easy to get started and manage your batch workloads effectively.
https://github.com/volcano-sh/volcano
Volcano is a powerful batch system built on Kubernetes, designed to manage complex workloads like machine learning, bioinformatics, and big data applications. It integrates with popular frameworks such as TensorFlow, Spark, and PyTorch. Volcano benefits users by providing efficient scheduling and management of high-performance workloads, leveraging over 15 years of experience and best practices from the open source community. It is widely used in various industries and has a strong community support with hundreds of contributors. Installing Volcano is straightforward, either through YAML files or Helm charts, making it easy to get started and manage your batch workloads effectively.
https://github.com/volcano-sh/volcano
GitHub
GitHub - volcano-sh/volcano: A Cloud Native Batch System (Project under CNCF)
A Cloud Native Batch System (Project under CNCF). Contribute to volcano-sh/volcano development by creating an account on GitHub.
#other #apachespark #awesome #bigdata #data #dataengineering #sql
This handbook is a comprehensive guide to help you become a great data engineer. It provides a roadmap to get started, including hands-on projects, interview tips, and recommended books. You can join various communities and follow newsletters to stay updated. The handbook also lists top companies, blogs, whitepapers, YouTube channels, podcasts, and courses that can help you learn and grow in data engineering. Using these resources, you can gain practical knowledge, network with professionals, and stay informed about the latest trends and technologies in the field. This will help you build a strong foundation and advance your career as a data engineer.
https://github.com/DataExpert-io/data-engineer-handbook
This handbook is a comprehensive guide to help you become a great data engineer. It provides a roadmap to get started, including hands-on projects, interview tips, and recommended books. You can join various communities and follow newsletters to stay updated. The handbook also lists top companies, blogs, whitepapers, YouTube channels, podcasts, and courses that can help you learn and grow in data engineering. Using these resources, you can gain practical knowledge, network with professionals, and stay informed about the latest trends and technologies in the field. This will help you build a strong foundation and advance your career as a data engineer.
https://github.com/DataExpert-io/data-engineer-handbook
GitHub
GitHub - DataExpert-io/data-engineer-handbook: This is a repo with links to everything you'd ever want to learn about data engineering
This is a repo with links to everything you'd ever want to learn about data engineering - DataExpert-io/data-engineer-handbook
#java #bigdata #data_encryption #data_pipeline #database #database_cluster #database_gateway #database_middleware #distributed_database #distributed_sql_database #distributed_transaction #encrypt #mysql #postgresql #read_write_splitting #shard #sql
Apache ShardingSphere is a powerful tool that helps manage and scale databases. It allows you to break down large databases into smaller pieces (sharding), handle more data traffic (scaling), and secure your data with encryption. This tool works with any database and provides a unified way for applications to interact with multiple databases as if they were one.
The benefits include Your database can handle more data and users without slowing down.
- **Improved Security** Applications only need to communicate with one standardized service, making it simpler to manage.
- **Flexibility**: You can customize the tool to fit your needs using its pluggable architecture.
Overall, Apache ShardingSphere makes managing and scaling databases much easier and more efficient.
https://github.com/apache/shardingsphere
Apache ShardingSphere is a powerful tool that helps manage and scale databases. It allows you to break down large databases into smaller pieces (sharding), handle more data traffic (scaling), and secure your data with encryption. This tool works with any database and provides a unified way for applications to interact with multiple databases as if they were one.
The benefits include Your database can handle more data and users without slowing down.
- **Improved Security** Applications only need to communicate with one standardized service, making it simpler to manage.
- **Flexibility**: You can customize the tool to fit your needs using its pluggable architecture.
Overall, Apache ShardingSphere makes managing and scaling databases much easier and more efficient.
https://github.com/apache/shardingsphere
GitHub
GitHub - apache/shardingsphere: Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Acrossโฆ
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases. - apache/shardingsphere
#go #bigdata #cloud_native #distributed_systems #filesystem #go #golang #hdfs #object_storage #posix #redis #s3 #storage
JuiceFS is a powerful file system designed for cloud environments. It allows you to use massive cloud storage as if it were local storage, without changing your code. Here are the key benefits JuiceFS offers low latency and high throughput, making it suitable for big data, machine learning, and AI applications.
- **POSIX Compatibility** Supports Kubernetes and various object storage services like Amazon S3, Google Cloud Storage, and more.
- **Strong Consistency** Ensures data security and efficiency.
- **Shared Access**: Multiple clients can read and write files simultaneously.
Using JuiceFS, you can efficiently manage large amounts of data in the cloud, making it easier to integrate with various platforms and applications.
https://github.com/juicedata/juicefs
JuiceFS is a powerful file system designed for cloud environments. It allows you to use massive cloud storage as if it were local storage, without changing your code. Here are the key benefits JuiceFS offers low latency and high throughput, making it suitable for big data, machine learning, and AI applications.
- **POSIX Compatibility** Supports Kubernetes and various object storage services like Amazon S3, Google Cloud Storage, and more.
- **Strong Consistency** Ensures data security and efficiency.
- **Shared Access**: Multiple clients can read and write files simultaneously.
Using JuiceFS, you can efficiently manage large amounts of data in the cloud, making it easier to integrate with various platforms and applications.
https://github.com/juicedata/juicefs
GitHub
GitHub - juicedata/juicefs: JuiceFS is a distributed POSIX file system built on top of Redis and S3.
JuiceFS is a distributed POSIX file system built on top of Redis and S3. - juicedata/juicefs
๐1๐1
#rust #bigdata #cloud_native #distributed_systems #filesystem #minio #object_storage #oss #rust #s3
RustFS is a fast and safe distributed object storage system built with Rust, offering high performance and scalability for large data needs like AI and big data. It is compatible with S3, easy to use, and open source under the business-friendly Apache 2.0 license. Compared to others like MinIO, RustFS provides better memory safety, no risky data logging, and supports local cloud providers. You can quickly install it via a script or Docker, manage storage through a simple web console, and benefit from a strong community and detailed documentation. This makes RustFS a reliable, cost-effective choice for secure, scalable storage.
https://github.com/rustfs/rustfs
RustFS is a fast and safe distributed object storage system built with Rust, offering high performance and scalability for large data needs like AI and big data. It is compatible with S3, easy to use, and open source under the business-friendly Apache 2.0 license. Compared to others like MinIO, RustFS provides better memory safety, no risky data logging, and supports local cloud providers. You can quickly install it via a script or Docker, manage storage through a simple web console, and benefit from a strong community and detailed documentation. This makes RustFS a reliable, cost-effective choice for secure, scalable storage.
https://github.com/rustfs/rustfs
GitHub
GitHub - rustfs/rustfs: ๐2.3x faster than MinIO for 4KB object payloads. RustFS is an open-source, S3-compatible high-performanceโฆ
๐2.3x faster than MinIO for 4KB object payloads. RustFS is an open-source, S3-compatible high-performance object storage system supporting migration and coexistence with other S3-compatible platfor...
#rust #ai #bigdata #database #lakehouse #olap #rust #serverless #snowflake #sql
Databend is an open-source, cloud data warehouse built with Rust that offers a fast, cost-effective alternative to Snowflake. It supports both cloud and on-premise deployment, handles massive data (over 800 petabytes), and processes over 100 million queries daily. Databend excels in fast query execution, real-time data updates, and simplified data ingestion without extra ETL tools. It includes AI-powered analytics, advanced indexing, ACID compliance, and flexible schema support for semi-structured data. Using Databend can save you money, give you full control over your data, and provide high performance for complex analytics on large datasets[1][3].
https://github.com/databendlabs/databend
Databend is an open-source, cloud data warehouse built with Rust that offers a fast, cost-effective alternative to Snowflake. It supports both cloud and on-premise deployment, handles massive data (over 800 petabytes), and processes over 100 million queries daily. Databend excels in fast query execution, real-time data updates, and simplified data ingestion without extra ETL tools. It includes AI-powered analytics, advanced indexing, ACID compliance, and flexible schema support for semi-structured data. Using Databend can save you money, give you full control over your data, and provide high performance for complex analytics on large datasets[1][3].
https://github.com/databendlabs/databend
GitHub
GitHub - databendlabs/databend: ๐๐-๐ก๐ฎ๐๐ถ๐๐ฒ ๐๐ฎ๐๐ฎ ๐ช๐ฎ๐ฟ๐ฒ๐ต๐ผ๐๐๐ฒ. Blazing analytics, fast search, geo insights, vector AI. Built for multimodalโฆ
๐๐-๐ก๐ฎ๐๐ถ๐๐ฒ ๐๐ฎ๐๐ฎ ๐ช๐ฎ๐ฟ๐ฒ๐ต๐ผ๐๐๐ฒ. Blazing analytics, fast search, geo insights, vector AI. Built for multimodal analytics, Open-source Snowflake alternative. https://databend.com - databendlabs/databend