GitHub Trends
10.1K subscribers
15.3K links
See what the GitHub community is most excited about today.

A bot automatically fetches new repositories from https://github.com/trending and sends them to the channel.

Author and maintainer: https://github.com/katursis
Download Telegram
#java #bigquery #database #dbt #delta_lake #elt #etl #hadoop #hive #hudi #iceberg #lakehouse #olap #query_engine #real_time #redshift #snowflake #spark #sql

Apache Doris is a high-performance, real-time analytical database that offers several benefits. It is easy to use with a simple architecture and supports standard SQL, making it compatible with MySQL tools. Doris delivers extremely fast query performance, even under massive data loads, making it ideal for scenarios like report analysis, ad-hoc queries, unified data warehouses, and data lake queries. It also supports federated querying of various data sources and has rich ecosystem integrations with tools like Spark and Flink. This makes Apache Doris a versatile and powerful tool for handling complex analytical tasks efficiently.

https://github.com/apache/doris
#python #airflow #apache #apache_airflow #automation #dag #data_engineering #data_integration #data_orchestrator #data_pipelines #data_science #elt #etl #machine_learning #mlops #orchestration #python #scheduler #workflow #workflow_engine #workflow_orchestration

Apache Airflow is a tool that helps you manage and automate workflows. You can write your workflows as code, making them easier to maintain, version, test, and collaborate on. Airflow lets you schedule tasks and monitor their progress through a user-friendly interface. It supports dynamic pipeline generation, is highly extensible, and scalable, allowing you to define your own operators and executors.

Using Airflow benefits you by making your workflows more organized, efficient, and reliable. It simplifies the process of managing complex tasks and provides clear visualizations of your workflow's performance, helping you identify and troubleshoot issues quickly. This makes it easier to manage data processing and other automated tasks effectively.

https://github.com/apache/airflow
👍1
#java #batch #cdc #change_data_capture #data_integration #data_pipeline #distributed #elt #etl #flink #kafka #mysql #paimon #postgresql #real_time #schema_evolution

Flink CDC is a tool that helps you move and transform data in real-time or in batches. It makes data integration simple by using YAML files to describe how data should be moved and transformed. This tool offers features like full database synchronization, table sharding, schema evolution, and data transformation. To use it, you need to set up an Apache Flink cluster, download Flink CDC, create a YAML file to define your data sources and sinks, and then run the job. This benefits you by making it easier to manage and integrate your data efficiently across different databases.

https://github.com/apache/flink-cdc
#java #apache #batch #cdc #change_data_capture #data_ingestion #data_integration #elt #high_performance #offline #real_time #streaming

Apache SeaTunnel is a powerful tool for integrating and synchronizing large amounts of data from various sources. It supports over 100 connectors, allowing you to connect to many different data sources. SeaTunnel is efficient, stable, and resource-friendly, minimizing the use of computing resources and JDBC connections. It also provides real-time monitoring and ensures data quality to prevent loss or duplication. You can use it with different execution engines like Flink, Spark, and SeaTunnel Zeta Engine. This tool is beneficial because it simplifies complex data synchronization tasks, offers high throughput with low latency, and provides detailed insights during the process. Additionally, it has a user-friendly web project for visual job management, making it easier to manage your data integration tasks.

https://github.com/apache/seatunnel