GitHub Trends

#python #billion_parameters #compression #data_parallelism #deep_learning #gpu #inference #machine_learning #mixture_of_experts #model_parallelism #pipeline_parallelism #pytorch #trillion_parameters #zero

DeepSpeed is a powerful tool for training and using large artificial intelligence models quickly and efficiently. It allows you to train models with billions or even trillions of parameters, which is much faster and cheaper than other methods. With DeepSpeed, you can achieve significant speedups, reduce costs, and improve the performance of your models. For example, it can train ChatGPT-like models 15 times faster than current state-of-the-art systems. This makes it easier to work with large language models without needing massive resources, making AI more accessible and efficient for everyone.

https://github.com/microsoft/DeepSpeed

GitHub

GitHub - deepspeedai/DeepSpeed: DeepSpeed is a deep learning optimization library that makes distributed training and inference…

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective. - deepspeedai/DeepSpeed

369 views16:30

GitHub Trends

#cplusplus #android #audio_processing #c_plus_plus #calculator #computer_vision #deep_learning #framework #graph_based #graph_framework #inference #machine_learning #mediapipe #mobile_development #perception #pipeline_framework #stream_processing #video_processing

MediaPipe is a tool that helps you add smart machine learning features to your apps and devices. It works on mobile, web, desktop, and other devices. You can use pre-made solutions for tasks like vision, text, and audio processing, or customize the models to fit your needs. MediaPipe also offers tools like Model Maker and Studio to help you create and test your solutions easily. This makes it easier to delight your customers with innovative features without needing deep machine learning expertise.

https://github.com/google-ai-edge/mediapipe

GitHub

GitHub - google-ai-edge/mediapipe: Cross-platform, customizable ML solutions for live and streaming media.

Cross-platform, customizable ML solutions for live and streaming media. - google-ai-edge/mediapipe

333 views20:30

GitHub Trends

#typescript #cd #ci #git #gitlab #gitlab_ci #local #pipeline #push #uncomitted #untracked

You can run GitLab CI pipelines locally using `gitlab-ci-local`, which saves you time and effort by not having to push changes to test your `.gitlab-ci.yml` files. This tool allows you to execute pipelines as a shell executor or docker executor, eliminating the need for development-specific scripts. It also offers convenience features like CLI options, environment files, bash aliases, and tab completion. You can list pipeline jobs before running them and customize variables and artifacts easily. This makes your development process more efficient and streamlined.

https://github.com/firecow/gitlab-ci-local

GitHub

GitHub - firecow/gitlab-ci-local: Tired of pushing to test your .gitlab-ci.yml?

Tired of pushing to test your .gitlab-ci.yml? Contribute to firecow/gitlab-ci-local development by creating an account on GitHub.

2.48K views11:30

GitHub Trends

#python #artificial_intelligence #dag #data_science #data_visualization #dataflow #developer_tools #machine_learning #notebooks #pipeline #python #reactive #web_app

Marimo is a powerful tool for Python users that makes working with notebooks much easier and more efficient. Here’s what it offers When you run a cell or interact with UI elements, marimo automatically updates dependent cells, keeping your code and outputs consistent.
- **Interactive** Marimo ensures no hidden state and deterministic execution, making your work reliable.
- **Executable** Notebooks are stored as `.py` files, making version control easy.
- **Modern Editor**: It includes features like GitHub Copilot, AI assistants, and more quality-of-life tools.

Using marimo helps you avoid errors, keeps your code organized, and makes sharing and deploying your work simpler.

https://github.com/marimo-team/marimo

GitHub

GitHub - marimo-team/marimo: A reactive notebook for Python — run reproducible experiments, query with SQL, execute as a script…

A reactive notebook for Python — run reproducible experiments, query with SQL, execute as a script, deploy as an app, and version with git. Stored as pure Python. All in a modern, AI-native editor....

351 views13:30

GitHub Trends

#rust #events #forwarder #logs #metrics #observability #parser #pipeline #router #rust #stream_processing #vector

Vector is a powerful tool for managing your observability data, such as logs and metrics. It allows you to collect, transform, and route your data to any vendor you choose, giving you full control. Vector is reliable, fast (up to 10x faster than alternatives), and secure. It helps reduce costs, improve data quality, and consolidate agents, making your observability processes more efficient and reliable. With a strong community support and extensive documentation, Vector is used by many big companies and is downloaded over 100,000 times daily. This makes it a valuable tool for anyone looking to manage their data effectively.

https://github.com/vectordotdev/vector

GitHub

GitHub - vectordotdev/vector: A high-performance observability data pipeline.

A high-performance observability data pipeline. Contribute to vectordotdev/vector development by creating an account on GitHub.

👍1

424 views16:30

GitHub Trends

#python #automation #data #data_engineering #data_ops #data_science #infrastructure #ml_ops #observability #orchestration #pipeline #prefect #python #workflow #workflow_engine

Prefect is a tool that helps you automate and manage data workflows in Python. It makes it easy to turn your scripts into reliable and flexible workflows that can handle unexpected changes. With Prefect, you can schedule tasks, retry failed operations, and monitor your workflows. You can install it using `pip install -U prefect` and start creating workflows with just a few lines of code. This helps data teams work more efficiently, reduce errors, and save time. You can also use Prefect Cloud for more advanced features and support.

https://github.com/PrefectHQ/prefect

GitHub

GitHub - PrefectHQ/prefect: Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

Prefect is a workflow orchestration framework for building resilient data pipelines in Python. - PrefectHQ/prefect

374 views14:00

GitHub Trends

#python #cloud_native #cncf #deep_learning #docker #fastapi #framework #generative_ai #grpc #jaeger #kubernetes #llmops #machine_learning #microservice #mlops #multimodal #neural_search #opentelemetry #orchestration #pipeline #prometheus

Jina-serve is a tool that helps you build and deploy AI services easily. It supports major machine learning frameworks and allows you to scale your services from local development to production quickly. You can use it to create AI services that communicate via gRPC, HTTP, and WebSockets. It has features like built-in Docker integration, one-click cloud deployment, and support for Kubernetes and Docker Compose, making it easy to manage and scale your AI applications. This makes it simpler for you to focus on the core logic of your AI projects without worrying about the technical details of deployment and scaling.

https://github.com/jina-ai/serve

GitHub

GitHub - jina-ai/serve: ☁️ Build multimodal AI applications with cloud-native stack

☁️ Build multimodal AI applications with cloud-native stack - jina-ai/serve

456 views15:00

GitHub Trends

#python #cleandata #data_engineering #data_profilers #data_profiling #data_quality #data_science #data_unit_tests #datacleaner #datacleaning #dataquality #dataunittest #eda #exploratory_analysis #exploratory_data_analysis #exploratorydataanalysis #mlops #pipeline #pipeline_debt #pipeline_testing #pipeline_tests

GX Core is a powerful tool for ensuring data quality. It allows you to write simple tests, called "Expectations," to check if your data meets certain standards. This helps teams work together more effectively and keeps everyone informed about the data's quality. You can automatically generate reports, making it easy to share results and preserve your organization's knowledge about its data. To get started, you just need to install GX Core in a Python virtual environment and follow some simple steps. This makes managing data quality much simpler and more efficient.

https://github.com/great-expectations/great_expectations

GitHub

GitHub - great-expectations/great_expectations: Always know what to expect from your data.

Always know what to expect from your data. Contribute to great-expectations/great_expectations development by creating an account on GitHub.

612 views12:30

GitHub Trends

#python #ai #big_model #data_parallelism #deep_learning #distributed_computing #foundation_models #heterogeneous_training #hpc #inference #large_scale #model_parallelism #pipeline_parallelism

Colossal-AI is a powerful tool that helps make large AI models faster, cheaper, and easier to use. It uses special techniques like parallelism to speed up training on big models without needing expensive hardware. This means users can train complex AI models even on regular computers or laptops, saving time and money. Colossal-AI also supports various applications across industries like medicine, video generation, and chatbots, making it very versatile for developers.

https://github.com/hpcaitech/ColossalAI

GitHub

GitHub - hpcaitech/ColossalAI: Making large AI models cheaper, faster and more accessible

Making large AI models cheaper, faster and more accessible - hpcaitech/ColossalAI

544 views00:00

GitHub Trends

#java #automation #data_orchestration #devops #high_availability #infrastructure_as_code #java #low_code #lowcode #orchestration #pipeline #pipeline_as_code #workflow

Kestra is an open-source platform that helps manage complex workflows easily. It uses a simple YAML code to define workflows, which can be automated based on schedules or real-time events. Kestra supports many plugins, allowing integration with various data sources and tools. This makes it easy to automate tasks like data processing and infrastructure management. The platform is scalable, fault-tolerant, and offers real-time monitoring, making it beneficial for teams handling large data pipelines and complex workflows. It simplifies workflow management, reduces errors, and boosts efficiency.

https://github.com/kestra-io/kestra

GitHub

GitHub - kestra-io/kestra: Orchestrate everything - from scripts to data, infra, AI, and business - as code, with UI and AI Copilot.…

Orchestrate everything - from scripts to data, infra, AI, and business - as code, with UI and AI Copilot. Simple. Fast. Scalable. - kestra-io/kestra

588 views12:30

GitHub Trends

#rust #ai #change_data_capture #context_engineering #data #data_engineering #data_indexing #data_infrastructure #data_processing #etl #hacktoberfest #help_wanted #indexing #knowledge_graph #llm #pipeline #python #rag #real_time #rust #semantic_search

**CocoIndex** is a fast, open-source Python tool (Rust core) for transforming data into AI formats like vector indexes or knowledge graphs. Define simple data flows in ~100 lines of code using plug-and-play blocks for sources, embeddings, and targets—install via `pip install cocoindex`, add Postgres, and run. It auto-syncs fresh data with minimal recompute on changes, tracking lineage. **You save time building scalable RAG/semantic search pipelines effortlessly, avoiding complex ETL and stale data issues for production-ready AI apps.**

https://github.com/cocoindex-io/cocoindex

GitHub

GitHub - cocoindex-io/cocoindex: Data transformation framework for AI. Ultra performant, with incremental processing. 🌟 Star if…

Data transformation framework for AI. Ultra performant, with incremental processing. 🌟 Star if you like it! - cocoindex-io/cocoindex

329 views11:30

About

Blog

Apps

Platform