GitHub Trends
10.1K subscribers
15.3K links
See what the GitHub community is most excited about today.

A bot automatically fetches new repositories from https://github.com/trending and sends them to the channel.

Author and maintainer: https://github.com/katursis
Download Telegram
#c_lang #bigdata #cloud_native #cluster #connected_vehicles #database #distributed #financial_analysis #industrial_iot #iot #metrics #monitoring #scalability #sql #tdengine #time_series #time_series_database #tsdb

TDengine is a powerful, open-source time-series database designed for handling large amounts of data from IoT devices, connected cars, and industrial IoT. Here are the key benefits It can handle billions of data collection points efficiently, outperforming other time-series databases in data ingestion, querying, and compression.
- **Simplified Solution** Designed for cloud environments, it supports distributed design, sharding, partitioning, and Kubernetes deployment.
- **Ease of Use** Makes data exploration and access efficient through features like super tables and pre-computation.
- **Open Source**: Available under open source licenses with an active developer community.

Using TDengine helps you manage and analyze large-scale time-series data efficiently, making it ideal for various IoT and industrial applications.

https://github.com/taosdata/TDengine
#go #batch_systems #bigdata #gene #golang #hpc #kubernetes #machine_learning

Volcano is a powerful batch system built on Kubernetes, designed to manage complex workloads like machine learning, bioinformatics, and big data applications. It integrates with popular frameworks such as TensorFlow, Spark, and PyTorch. Volcano benefits users by providing efficient scheduling and management of high-performance workloads, leveraging over 15 years of experience and best practices from the open source community. It is widely used in various industries and has a strong community support with hundreds of contributors. Installing Volcano is straightforward, either through YAML files or Helm charts, making it easy to get started and manage your batch workloads effectively.

https://github.com/volcano-sh/volcano
#other #apachespark #awesome #bigdata #data #dataengineering #sql

This handbook is a comprehensive guide to help you become a great data engineer. It provides a roadmap to get started, including hands-on projects, interview tips, and recommended books. You can join various communities and follow newsletters to stay updated. The handbook also lists top companies, blogs, whitepapers, YouTube channels, podcasts, and courses that can help you learn and grow in data engineering. Using these resources, you can gain practical knowledge, network with professionals, and stay informed about the latest trends and technologies in the field. This will help you build a strong foundation and advance your career as a data engineer.

https://github.com/DataExpert-io/data-engineer-handbook
#java #bigdata #data_encryption #data_pipeline #database #database_cluster #database_gateway #database_middleware #distributed_database #distributed_sql_database #distributed_transaction #encrypt #mysql #postgresql #read_write_splitting #shard #sql

Apache ShardingSphere is a powerful tool that helps manage and scale databases. It allows you to break down large databases into smaller pieces (sharding), handle more data traffic (scaling), and secure your data with encryption. This tool works with any database and provides a unified way for applications to interact with multiple databases as if they were one.

The benefits include Your database can handle more data and users without slowing down.
- **Improved Security** Applications only need to communicate with one standardized service, making it simpler to manage.
- **Flexibility**: You can customize the tool to fit your needs using its pluggable architecture.

Overall, Apache ShardingSphere makes managing and scaling databases much easier and more efficient.

https://github.com/apache/shardingsphere
#go #bigdata #cloud_native #distributed_systems #filesystem #go #golang #hdfs #object_storage #posix #redis #s3 #storage

JuiceFS is a powerful file system designed for cloud environments. It allows you to use massive cloud storage as if it were local storage, without changing your code. Here are the key benefits JuiceFS offers low latency and high throughput, making it suitable for big data, machine learning, and AI applications.
- **POSIX Compatibility** Supports Kubernetes and various object storage services like Amazon S3, Google Cloud Storage, and more.
- **Strong Consistency** Ensures data security and efficiency.
- **Shared Access**: Multiple clients can read and write files simultaneously.

Using JuiceFS, you can efficiently manage large amounts of data in the cloud, making it easier to integrate with various platforms and applications.

https://github.com/juicedata/juicefs
๐Ÿ‘1๐Ÿ‘Ž1
#rust #bigdata #cloud_native #distributed_systems #filesystem #minio #object_storage #oss #rust #s3

RustFS is a fast and safe distributed object storage system built with Rust, offering high performance and scalability for large data needs like AI and big data. It is compatible with S3, easy to use, and open source under the business-friendly Apache 2.0 license. Compared to others like MinIO, RustFS provides better memory safety, no risky data logging, and supports local cloud providers. You can quickly install it via a script or Docker, manage storage through a simple web console, and benefit from a strong community and detailed documentation. This makes RustFS a reliable, cost-effective choice for secure, scalable storage.

https://github.com/rustfs/rustfs
#rust #ai #bigdata #database #lakehouse #olap #rust #serverless #snowflake #sql

Databend is an open-source, cloud data warehouse built with Rust that offers a fast, cost-effective alternative to Snowflake. It supports both cloud and on-premise deployment, handles massive data (over 800 petabytes), and processes over 100 million queries daily. Databend excels in fast query execution, real-time data updates, and simplified data ingestion without extra ETL tools. It includes AI-powered analytics, advanced indexing, ACID compliance, and flexible schema support for semi-structured data. Using Databend can save you money, give you full control over your data, and provide high performance for complex analytics on large datasets[1][3].

https://github.com/databendlabs/databend