#python #airflow #airflow_operators #aws #aws_ec2 #aws_s3 #aws_sdk #cassandra #cassandra_database #cloudformation #cluster #data #data_engineering #data_engineering_pipeline #data_lake #data_modeling #data_warehouse #etl_pipeline #infrastructure #postgres #postgresql_database
https://github.com/san089/Udacity-Data-Engineering-Projects
https://github.com/san089/Udacity-Data-Engineering-Projects
GitHub
GitHub - san089/Udacity-Data-Engineering-Projects: Few projects related to Data Engineering including Data Modeling, Infrastructure…
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development. - san089/Udacity-Data-Engineering-Projects
#python #analytics #dagster #data_orchestrator #data_pipelines #data_science #etl #scheduler #workflow #workflow_automation
https://github.com/dagster-io/dagster
https://github.com/dagster-io/dagster
GitHub
GitHub - dagster-io/dagster: An orchestration platform for the development, production, and observation of data assets.
An orchestration platform for the development, production, and observation of data assets. - dagster-io/dagster
#java #airbyte #connectors #data #data_analysis #data_ingestion #data_integration #data_science #data_transfers #elt #etl #incremental_updates #integration #open_source #pipeline #pipelines #replications
https://github.com/airbytehq/airbyte
https://github.com/airbytehq/airbyte
GitHub
GitHub - airbytehq/airbyte: The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to…
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted. ...
#java #airflow #azkaban #dataworks #davinci #etl #flink #governance #griffin #hadoop #hive #hue #kettle #linkis #scriptis #spark #supperset #tableau #visualis #workflow #zeppelin
https://github.com/WeBankFinTech/DataSphereStudio
https://github.com/WeBankFinTech/DataSphereStudio
GitHub
GitHub - WeBankFinTech/DataSphereStudio: DataSphereStudio is a one stop data application development& management portal, covering…
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, ...
#java #apache #data_integration #data_pipeline #etl_framework #high_performance #offline #real_time #seatunnel #sql_engine
https://github.com/apache/incubator-seatunnel
https://github.com/apache/incubator-seatunnel
GitHub
GitHub - apache/seatunnel: SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool. - apache/seatunnel
#java #data #data_engineering #data_orchestration #data_orchestrator #data_pipeline #dataflow #elt #etl #kestra #orchestration #pipeline #scheduler #workflow #workflow_automation #workflow_engine
https://github.com/kestra-io/kestra
https://github.com/kestra-io/kestra
GitHub
GitHub - kestra-io/kestra: Orchestrate everything - from scripts to data, infra, AI, and business - as code, with UI and AI Copilot.…
Orchestrate everything - from scripts to data, infra, AI, and business - as code, with UI and AI Copilot. Simple. Fast. Scalable. - kestra-io/kestra
#scala #etl_pipeline #flink #one_stop_solution #spark #streaming #streaming_warehouse #streamx
https://github.com/streamxhub/streamx
https://github.com/streamxhub/streamx
GitHub
GitHub - apache/incubator-streampark: StreamPark, Make stream processing easier! easy-to-use streaming application development…
StreamPark, Make stream processing easier! easy-to-use streaming application development framework and operation platform - GitHub - apache/incubator-streampark: StreamPark, Make stream processing ...
#java #bigquery #database #dbt #delta_lake #elt #etl #hadoop #hive #hudi #iceberg #lakehouse #olap #query_engine #real_time #redshift #snowflake #spark #sql
Apache Doris is a high-performance, real-time analytical database that offers several benefits. It is easy to use with a simple architecture and supports standard SQL, making it compatible with MySQL tools. Doris delivers extremely fast query performance, even under massive data loads, making it ideal for scenarios like report analysis, ad-hoc queries, unified data warehouses, and data lake queries. It also supports federated querying of various data sources and has rich ecosystem integrations with tools like Spark and Flink. This makes Apache Doris a versatile and powerful tool for handling complex analytical tasks efficiently.
https://github.com/apache/doris
Apache Doris is a high-performance, real-time analytical database that offers several benefits. It is easy to use with a simple architecture and supports standard SQL, making it compatible with MySQL tools. Doris delivers extremely fast query performance, even under massive data loads, making it ideal for scenarios like report analysis, ad-hoc queries, unified data warehouses, and data lake queries. It also supports federated querying of various data sources and has rich ecosystem integrations with tools like Spark and Flink. This makes Apache Doris a versatile and powerful tool for handling complex analytical tasks efficiently.
https://github.com/apache/doris
GitHub
GitHub - apache/doris: Apache Doris is an easy-to-use, high performance and unified analytics database.
Apache Doris is an easy-to-use, high performance and unified analytics database. - apache/doris
#python #analytics #dagster #data_engineering #data_integration #data_orchestrator #data_pipelines #data_science #etl #metadata #mlops #orchestration #python #scheduler #workflow #workflow_automation
Dagster is a tool that helps you manage and automate your data workflows. You can define your data assets, like tables or machine learning models, using Python functions. Dagster then runs these functions at the right time and keeps your data up-to-date. It offers features like integrated lineage and observability, making it easier to track and manage your data. This tool is useful for every stage of data development, from local testing to production, and it integrates well with other popular data tools. Using Dagster, you can build reusable components, spot data quality issues early, and scale your data pipelines efficiently. This makes your work more productive and helps maintain control over complex data systems.
https://github.com/dagster-io/dagster
Dagster is a tool that helps you manage and automate your data workflows. You can define your data assets, like tables or machine learning models, using Python functions. Dagster then runs these functions at the right time and keeps your data up-to-date. It offers features like integrated lineage and observability, making it easier to track and manage your data. This tool is useful for every stage of data development, from local testing to production, and it integrates well with other popular data tools. Using Dagster, you can build reusable components, spot data quality issues early, and scale your data pipelines efficiently. This makes your work more productive and helps maintain control over complex data systems.
https://github.com/dagster-io/dagster
GitHub
GitHub - dagster-io/dagster: An orchestration platform for the development, production, and observation of data assets.
An orchestration platform for the development, production, and observation of data assets. - dagster-io/dagster
👍1
#python #airflow #apache #apache_airflow #automation #dag #data_engineering #data_integration #data_orchestrator #data_pipelines #data_science #elt #etl #machine_learning #mlops #orchestration #python #scheduler #workflow #workflow_engine #workflow_orchestration
Apache Airflow is a tool that helps you manage and automate workflows. You can write your workflows as code, making them easier to maintain, version, test, and collaborate on. Airflow lets you schedule tasks and monitor their progress through a user-friendly interface. It supports dynamic pipeline generation, is highly extensible, and scalable, allowing you to define your own operators and executors.
Using Airflow benefits you by making your workflows more organized, efficient, and reliable. It simplifies the process of managing complex tasks and provides clear visualizations of your workflow's performance, helping you identify and troubleshoot issues quickly. This makes it easier to manage data processing and other automated tasks effectively.
https://github.com/apache/airflow
Apache Airflow is a tool that helps you manage and automate workflows. You can write your workflows as code, making them easier to maintain, version, test, and collaborate on. Airflow lets you schedule tasks and monitor their progress through a user-friendly interface. It supports dynamic pipeline generation, is highly extensible, and scalable, allowing you to define your own operators and executors.
Using Airflow benefits you by making your workflows more organized, efficient, and reliable. It simplifies the process of managing complex tasks and provides clear visualizations of your workflow's performance, helping you identify and troubleshoot issues quickly. This makes it easier to manage data processing and other automated tasks effectively.
https://github.com/apache/airflow
GitHub
GitHub - apache/airflow: Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows - apache/airflow
👍1