GitHub Trends – Telegram

GitHub Trends

@githubtrending

10.1K subscribers

15.3K links

See what the GitHub community is most excited about today.

A bot automatically fetches new repositories from https://github.com/trending and sends them to the channel.

Author and maintainer: https://github.com/katursis

Download Telegram

About

Blog

Apps

Platform

10.1K subscribers

#rust #arrow #dataframe #datafusion #distributed #java #jvm #kotlin #kubernetes #scala #spark

https://github.com/ballista-compute/ballista

GitHub - ballista-compute/ballista: Distributed compute platform implemented in Rust, and powered by Apache Arrow.

Distributed compute platform implemented in Rust, and powered by Apache Arrow. - GitHub - ballista-compute/ballista: Distributed compute platform implemented in Rust, and powered by Apache Arrow.

1.47K views12:35

#other #big_data #spark #pyspark

https://github.com/ankurchavda/SparkLearning

GitHub - ankurchavda/SparkLearning: A comprehensive Spark guide collated from multiple sources that can be referred to learn more…

A comprehensive Spark guide collated from multiple sources that can be referred to learn more about Spark or as an interview refresher. - ankurchavda/SparkLearning

905 views12:35

#scala #ai #apache_spark #azure #big_data #cognitive_services #data_science #databricks #deep_learning #http #lightgbm #machine_learning #microsoft #ml #model_deployment #onnx #opencv #pyspark #spark #synapse

https://github.com/microsoft/SynapseML

GitHub - microsoft/SynapseML: Simple and Distributed Machine Learning

Simple and Distributed Machine Learning. Contribute to microsoft/SynapseML development by creating an account on GitHub.

939 views09:01

#java #airflow #azkaban #dataworks #davinci #etl #flink #governance #griffin #hadoop #hive #hue #kettle #linkis #scriptis #spark #supperset #tableau #visualis #workflow #zeppelin

https://github.com/WeBankFinTech/DataSphereStudio

GitHub - WeBankFinTech/DataSphereStudio: DataSphereStudio is a one stop data application development& management portal, covering…

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, ...

990 views09:00

#hcl #argocd #aws #aws_eks_cluster #bottlerocket #cluster_autoscaler #eks #eks_addons #eks_clusters #eks_fargate #fargate #fluentbit #fluxcd #helm_charts #ingress_controller #kubernetes #spark #spark_operator #terraform #traefik #yunikorn

https://github.com/aws-ia/terraform-aws-eks-blueprints

GitHub - aws-ia/terraform-aws-eks-blueprints: Configure and deploy complete EKS clusters.

Configure and deploy complete EKS clusters. Contribute to aws-ia/terraform-aws-eks-blueprints development by creating an account on GitHub.

628 views10:57

#scala #etl_pipeline #flink #one_stop_solution #spark #streaming #streaming_warehouse #streamx

https://github.com/streamxhub/streamx

GitHub - apache/incubator-streampark: StreamPark, Make stream processing easier! easy-to-use streaming application development…

StreamPark, Make stream processing easier! easy-to-use streaming application development framework and operation platform - GitHub - apache/incubator-streampark: StreamPark, Make stream processing ...

552 views10:58

#cplusplus #big_data #clickhouse #distributed_database #lakehouse #olap_database #spark #sql #ytsaurus

https://github.com/ytsaurus/ytsaurus

GitHub - ytsaurus/ytsaurus: YTsaurus is a scalable and fault-tolerant open-source big data platform.

YTsaurus is a scalable and fault-tolerant open-source big data platform. - ytsaurus/ytsaurus

1.17K views10:56

#scala #clickhouse #simd #spark_sql #vectorization #velox

https://github.com/oap-project/gluten

GitHub - apache/incubator-gluten: Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native…

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines. - apache/incubator-gluten

1.26K views19:58

#scala #big_data #gpu #rapids #spark

https://github.com/NVIDIA/spark-rapids

GitHub - NVIDIA/spark-rapids: Spark RAPIDS plugin - accelerate Apache Spark with GPUs

Spark RAPIDS plugin - accelerate Apache Spark with GPUs - NVIDIA/spark-rapids

1.71K views11:56

#jupyter_notebook #ai #aihub #argo #automl #gpt #inference #kubeflow #kubernetes #llmops #mlops #notebook #pipeline #pytorch #spark #vgpu #workflow

https://github.com/tencentmusic/cube-studio

GitHub - tencentmusic/cube-studio: cube studio开源云原生一站式机器学习/深度学习/大模型AI平台，mlops算法链路全流程，算力租赁平台，notebook在线开发，拖拉拽任务流pipeline编排，多机多卡…

cube studio开源云原生一站式机器学习/深度学习/大模型AI平台，mlops算法链路全流程，算力租赁平台，notebook在线开发，拖拉拽任务流pipeline编排，多机多卡分布式训练，超参搜索，推理服务VGPU虚拟化，边缘计算，标注平台自动化标注，deepseek等大模型sft微调/奖励模型/强化学习训练，vllm/ollama/mindie大模型多机推理，私有知识库，AI模型市场...

👍3

1.52K views13:57

#java #bigquery #database #dbt #delta_lake #elt #etl #hadoop #hive #hudi #iceberg #lakehouse #olap #query_engine #real_time #redshift #snowflake #spark #sql

Apache Doris is a high-performance, real-time analytical database that offers several benefits. It is easy to use with a simple architecture and supports standard SQL, making it compatible with MySQL tools. Doris delivers extremely fast query performance, even under massive data loads, making it ideal for scenarios like report analysis, ad-hoc queries, unified data warehouses, and data lake queries. It also supports federated querying of various data sources and has rich ecosystem integrations with tools like Spark and Flink. This makes Apache Doris a versatile and powerful tool for handling complex analytical tasks efficiently.

https://github.com/apache/doris

GitHub - apache/doris: Apache Doris is an easy-to-use, high performance and unified analytics database.

Apache Doris is an easy-to-use, high performance and unified analytics database. - apache/doris

306 views11:18