GitHub Trends

871 views09:03

#java #batch #cdc #change_data_capture #data_integration #data_pipeline #distributed #elt #etl #flink #kafka #mysql #paimon #postgresql #real_time #schema_evolution

Flink CDC is a tool that helps you move and transform data in real-time or in batches. It makes data integration simple by using YAML files to describe how data should be moved and transformed. This tool offers features like full database synchronization, table sharding, schema evolution, and data transformation. To use it, you need to set up an Apache Flink cluster, download Flink CDC, create a YAML file to define your data sources and sinks, and then run the job. This benefits you by making it easier to manage and integrate your data efficiently across different databases.

https://github.com/apache/flink-cdc

GitHub

GitHub - apache/flink-cdc: Flink CDC is a streaming data integration tool

Flink CDC is a streaming data integration tool. Contribute to apache/flink-cdc development by creating an account on GitHub.

513 views14:30

GitHub Trends

#java #apache #batch #cdc #change_data_capture #data_ingestion #data_integration #elt #high_performance #offline #real_time #streaming

Apache SeaTunnel is a powerful tool for integrating and synchronizing large amounts of data from various sources. It supports over 100 connectors, allowing you to connect to many different data sources. SeaTunnel is efficient, stable, and resource-friendly, minimizing the use of computing resources and JDBC connections. It also provides real-time monitoring and ensures data quality to prevent loss or duplication. You can use it with different execution engines like Flink, Spark, and SeaTunnel Zeta Engine. This tool is beneficial because it simplifies complex data synchronization tasks, offers high throughput with low latency, and provides detailed insights during the process. Additionally, it has a user-friendly web project for visual job management, making it easier to manage your data integration tasks.

https://github.com/apache/seatunnel

GitHub

GitHub - apache/seatunnel: SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.

SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool. - apache/seatunnel

456 views00:00

GitHub Trends

#rust #ai #change_data_capture #context_engineering #data #data_engineering #data_indexing #data_infrastructure #data_processing #etl #hacktoberfest #help_wanted #indexing #knowledge_graph #llm #pipeline #python #rag #real_time #rust #semantic_search

**CocoIndex** is a fast, open-source Python tool (Rust core) for transforming data into AI formats like vector indexes or knowledge graphs. Define simple data flows in ~100 lines of code using plug-and-play blocks for sources, embeddings, and targets—install via `pip install cocoindex`, add Postgres, and run. It auto-syncs fresh data with minimal recompute on changes, tracking lineage. **You save time building scalable RAG/semantic search pipelines effortlessly, avoiding complex ETL and stale data issues for production-ready AI apps.**

https://github.com/cocoindex-io/cocoindex

GitHub

GitHub - cocoindex-io/cocoindex: Data transformation framework for AI. Ultra performant, with incremental processing. 🌟 Star if…

Data transformation framework for AI. Ultra performant, with incremental processing. 🌟 Star if you like it! - cocoindex-io/cocoindex

329 views11:30

About

Blog

Apps

Platform