GitHub Trends

#python #billion_parameters #compression #data_parallelism #deep_learning #gpu #inference #machine_learning #mixture_of_experts #model_parallelism #pipeline_parallelism #pytorch #trillion_parameters #zero

DeepSpeed is a powerful tool for training and using large artificial intelligence models quickly and efficiently. It allows you to train models with billions or even trillions of parameters, which is much faster and cheaper than other methods. With DeepSpeed, you can achieve significant speedups, reduce costs, and improve the performance of your models. For example, it can train ChatGPT-like models 15 times faster than current state-of-the-art systems. This makes it easier to work with large language models without needing massive resources, making AI more accessible and efficient for everyone.

https://github.com/microsoft/DeepSpeed

GitHub

GitHub - deepspeedai/DeepSpeed: DeepSpeed is a deep learning optimization library that makes distributed training and inference…

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective. - deepspeedai/DeepSpeed

369 views16:30

GitHub Trends

#swift #inference #ios #macos #pretrained_models #speech_recognition #swift #transformers #visionos #watchos #whisper

WhisperKit is a tool that helps your Apple devices recognize speech from audio files or live recordings using OpenAI's Whisper model. It works locally on your device, which means it doesn't need internet connection once set up. To use it, you can add WhisperKit to your Swift project easily through the Swift Package Manager or install a command-line version using Homebrew. This tool is beneficial because it allows you to transcribe audio quickly and efficiently right on your device, making it useful for various applications like voice assistants or transcription services.

https://github.com/argmaxinc/WhisperKit

GitHub

GitHub - argmaxinc/WhisperKit: On-device Speech Recognition for Apple Silicon

On-device Speech Recognition for Apple Silicon. Contribute to argmaxinc/WhisperKit development by creating an account on GitHub.

390 views13:30

GitHub Trends

#cplusplus #android #audio_processing #c_plus_plus #calculator #computer_vision #deep_learning #framework #graph_based #graph_framework #inference #machine_learning #mediapipe #mobile_development #perception #pipeline_framework #stream_processing #video_processing

MediaPipe is a tool that helps you add smart machine learning features to your apps and devices. It works on mobile, web, desktop, and other devices. You can use pre-made solutions for tasks like vision, text, and audio processing, or customize the models to fit your needs. MediaPipe also offers tools like Model Maker and Studio to help you create and test your solutions easily. This makes it easier to delight your customers with innovative features without needing deep machine learning expertise.

https://github.com/google-ai-edge/mediapipe

GitHub

GitHub - google-ai-edge/mediapipe: Cross-platform, customizable ML solutions for live and streaming media.

Cross-platform, customizable ML solutions for live and streaming media. - google-ai-edge/mediapipe

333 views20:30

GitHub Trends

#jupyter_notebook #aws #data_science #deep_learning #examples #inference #jupyter_notebook #machine_learning #mlops #reinforcement_learning #sagemaker #training

SageMaker-Core is a new Python SDK for Amazon SageMaker that makes it easier to work with machine learning resources. It provides an object-oriented interface, which means you can manage resources like training jobs, models, and endpoints more intuitively. The SDK simplifies code by allowing resource chaining, eliminating the need to manually specify parameters. It also includes features like auto code completion, comprehensive documentation, and type hints, making it faster and less error-prone to write code. This helps developers customize their ML workloads more efficiently and streamline their development process.

https://github.com/aws/amazon-sagemaker-examples

GitHub

GitHub - aws/amazon-sagemaker-examples: Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning…

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker. - GitHub - aws/amazon-sagemaker-examples: Example 📓 Jupyter notebooks...

348 views16:30

GitHub Trends

#python #amd #cuda #gpt #inference #inferentia #llama #llm #llm_serving #llmops #mlops #model_serving #pytorch #rocm #tpu #trainium #transformer #xpu

vLLM is a library that makes it easy, fast, and cheap to use large language models (LLMs). It is designed to be fast with features like efficient memory management, continuous batching, and optimized CUDA kernels. vLLM supports many popular models and can run on various hardware including NVIDIA GPUs, AMD CPUs and GPUs, and more. It also offers seamless integration with Hugging Face models and supports different decoding algorithms. This makes it flexible and easy to use for anyone needing to serve LLMs, whether for research or other applications. You can install vLLM easily with `pip install vllm` and find detailed documentation on their website.

https://github.com/vllm-project/vllm

GitHub

GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs

A high-throughput and memory-efficient inference and serving engine for LLMs - vllm-project/vllm

❤1

373 views13:00

GitHub Trends

#cplusplus #caffe #convolution #deep_learning #deep_neural_networks #diy #graph_algorithms #inference #inference_engine #maxpooling #ncnn #pnnx #pytorch #relu #resnet #sigmoid #yolo #yolov5

This course, "_动手自制大模型推理框架_" (Handcrafting Large Model Inference Framework), is a valuable resource for those interested in deep learning and model inference. It teaches you how to build a modern C++ project from scratch, focusing on designing and implementing a deep learning inference framework. The course supports latest models like LLama3.2 and Qwen2.5, and uses CUDA acceleration and Int8 quantization for better performance.

By taking this course, you will learn how to write efficient C++ code, manage projects with CMake and Git, design computational graphs, implement common operators like convolution and pooling, and optimize them for speed. This knowledge will be highly beneficial for job interviews and advancing your skills in deep learning. The course also includes practical demos on models like Unet and YoloV5, making it a hands-on learning experience.

https://github.com/zjhellofss/KuiperInfer

GitHub

GitHub - zjhellofss/KuiperInfer: 校招、秋招、春招、实习好项目！带你从零实现一个高性能的深度学习推理库，支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance…

校招、秋招、春招、实习好项目！带你从零实现一个高性能的深度学习推理库，支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step - zjhellofss/KuiperInfer

481 views11:30

GitHub Trends

#shell #ai #containers #inference_server #llamacpp #llm #podman #vllm

RamaLama is a tool that makes working with AI models easy by using containers. It checks your system for GPU support and uses CPU if no GPU is found. RamaLama uses container engines like Podman or Docker to run AI models, so you don't need to configure your system. You can pull and run AI models from various registries with simple commands, and it supports multiple types of hardware including CPUs and GPUs. This makes it convenient for users as they don't have to set up complex environments, and they can interact with different models easily.

https://github.com/containers/ramalama

GitHub

GitHub - containers/ramalama: RamaLama is an open-source developer tool that simplifies the local serving of AI models from any…

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of ...

381 views13:00

GitHub Trends

#python #cuda #deepseek #deepseek_llm #deepseek_v3 #inference #llama #llama2 #llama3 #llama3_1 #llava #llm #llm_serving #moe #pytorch #transformer #vlm

SGLang is a tool that makes working with large language models and vision language models much faster and more manageable. It has a fast backend runtime that optimizes model performance with features like prefix caching, continuous batching, and quantization. The frontend language is flexible and easy to use, allowing for complex tasks like chained generation calls and multi-modal inputs. SGLang supports many different models and has an active community behind it. This means you can get your models running quickly and efficiently, saving time and resources. Additionally, the extensive documentation and community support make it easier to get started and resolve any issues.

https://github.com/sgl-project/sglang

GitHub

GitHub - sgl-project/sglang: SGLang is a fast serving framework for large language models and vision language models.

SGLang is a fast serving framework for large language models and vision language models. - sgl-project/sglang

508 views12:00

GitHub Trends

#c_lang #convolutional_neural_network #convolutional_neural_networks #cpu #inference #inference_optimization #matrix_multiplication #mobile_inference #multithreading #neural_network #neural_networks #simd

XNNPACK is a powerful tool that helps make neural networks run faster on various devices like smartphones, computers, and Raspberry Pi boards. It supports many different types of processors and operating systems, making it very versatile. XNNPACK doesn't work directly with users but instead helps other machine learning frameworks like TensorFlow Lite, PyTorch, and ONNX Runtime to perform better. This means your apps and programs that use these frameworks can run neural networks more quickly and efficiently, which is beneficial because it saves time and improves performance.

https://github.com/google/XNNPACK

GitHub

GitHub - google/XNNPACK: High-efficiency floating-point neural network inference operators for mobile, server, and Web

High-efficiency floating-point neural network inference operators for mobile, server, and Web - google/XNNPACK

486 views13:00

GitHub Trends

#python #ai #big_model #data_parallelism #deep_learning #distributed_computing #foundation_models #heterogeneous_training #hpc #inference #large_scale #model_parallelism #pipeline_parallelism

Colossal-AI is a powerful tool that helps make large AI models faster, cheaper, and easier to use. It uses special techniques like parallelism to speed up training on big models without needing expensive hardware. This means users can train complex AI models even on regular computers or laptops, saving time and money. Colossal-AI also supports various applications across industries like medicine, video generation, and chatbots, making it very versatile for developers.

https://github.com/hpcaitech/ColossalAI

GitHub

GitHub - hpcaitech/ColossalAI: Making large AI models cheaper, faster and more accessible

Making large AI models cheaper, faster and more accessible - hpcaitech/ColossalAI

544 views00:00

GitHub Trends

#jupyter_notebook #computer_vision #deep_learning #inference #machine_learning #openvino

OpenVINO Notebooks are a collection of interactive Jupyter notebooks that help developers learn and experiment with the OpenVINO Toolkit. These notebooks provide an introduction to OpenVINO basics and show how to optimize deep learning inference using the API. They can be run on various platforms, including Windows, Ubuntu, macOS, and cloud services like Azure ML or Google Colab. This makes it easy for users to get started with AI development without needing extensive hardware knowledge, allowing them to focus on building applications efficiently across different devices.

https://github.com/openvinotoolkit/openvino_notebooks

GitHub

GitHub - openvinotoolkit/openvino_notebooks: 📚 Jupyter notebook tutorials for OpenVINO™

📚 Jupyter notebook tutorials for OpenVINO™. Contribute to openvinotoolkit/openvino_notebooks development by creating an account on GitHub.

582 views12:30

GitHub Trends

#typescript #api_client #hub #huggingface #inference #machine_learning

Hugging Face offers JavaScript libraries that let you easily use over 100,000 AI models for tasks like text generation, image creation, translation, and more, directly in your code or browser. You can create and manage model repositories, upload files, and run AI tasks such as chat completions or text-to-image generation with simple commands. These libraries work on modern environments without extra dependencies and support multiple providers, giving you flexible access to powerful AI tools. This helps you quickly add advanced AI features to your projects without deep AI expertise or complex setup.

https://github.com/huggingface/huggingface.js

GitHub

GitHub - huggingface/huggingface.js: Use Hugging Face with JavaScript

Use Hugging Face with JavaScript. Contribute to huggingface/huggingface.js development by creating an account on GitHub.

453 views12:00

GitHub Trends

#python #deep_learning #inference #llm #nlp #pytorch #transformer

Nano-vLLM is a small, fast, and easy-to-understand tool for running large language models offline. It matches the speed of bigger systems like vLLM but uses only about 1,200 lines of clean Python code, making it simple to read and modify. It includes smart features like prefix caching and tensor parallelism to boost performance. You can install it easily and run models like Qwen3-0.6B on your own GPU. This tool is great if you want fast, efficient AI inference without complex setups, ideal for learning, research, or small deployments on limited hardware.

https://github.com/GeeeekExplorer/nano-vllm

GitHub

GitHub - GeeeekExplorer/nano-vllm: Nano vLLM

Nano vLLM. Contribute to GeeeekExplorer/nano-vllm development by creating an account on GitHub.

470 views12:30

GitHub Trends

#python #audio_generation #diffusion #image_generation #inference #model_serving #multimodal #pytorch #transformer #video_generation

vLLM-Omni is a free, open-source tool that makes serving AI models for text, images, videos, and audio fast, easy, and cheap. It builds on vLLM for top speed using smart memory tricks, overlapping tasks, and flexible resource sharing across GPUs. You get 2x higher throughput, 35% less delay, and simple setup with Hugging Face models via OpenAI API—perfect for building quick multi-modal apps like chatbots or media generators without high costs.

https://github.com/vllm-project/vllm-omni

GitHub

GitHub - vllm-project/vllm-omni: A framework for efficient model inference with omni-modality models

A framework for efficient model inference with omni-modality models - vllm-project/vllm-omni

258 views15:30

About

Blog

Apps

Platform