#python #deep_learning #framework #inference #keras #layers #library #machine_learning #pytorch #shape #sota
https://github.com/szymonmaszke/torchlayers
https://github.com/szymonmaszke/torchlayers
GitHub
GitHub - szymonmaszke/torchlayers: Shape and dimension inference (Keras-like) for PyTorch layers and neural networks
Shape and dimension inference (Keras-like) for PyTorch layers and neural networks - szymonmaszke/torchlayers
#cplusplus #deep_learning #inference #inference_engine #openvino #performance
https://github.com/openvinotoolkit/openvino
https://github.com/openvinotoolkit/openvino
GitHub
GitHub - openvinotoolkit/openvino: OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference - openvinotoolkit/openvino
#jupyter_notebook #computer_vision #deep_learning #inference #machine_learning #openvino
https://github.com/openvinotoolkit/openvino_notebooks
https://github.com/openvinotoolkit/openvino_notebooks
GitHub
GitHub - openvinotoolkit/openvino_notebooks: 📚 Jupyter notebook tutorials for OpenVINO™
📚 Jupyter notebook tutorials for OpenVINO™. Contribute to openvinotoolkit/openvino_notebooks development by creating an account on GitHub.
#cuda #training #inference #transformer #bart #beam_search #sampling #bert #multilingual_nmt #gpt_2 #diverse_decoding
https://github.com/bytedance/lightseq
https://github.com/bytedance/lightseq
GitHub
GitHub - bytedance/lightseq: LightSeq: A High Performance Library for Sequence Processing and Generation
LightSeq: A High Performance Library for Sequence Processing and Generation - bytedance/lightseq
#python #blockchain #deep_learning #distributed_training #edge_ai #federated_learning #inference_engine #machine_learning #marketplace #mlops #on_device_training #privacy #security
https://github.com/FedML-AI/FedML
https://github.com/FedML-AI/FedML
GitHub
GitHub - FedML-AI/FedML: FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated…
FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs o...
#python #bloom #deep_learning #gpt #inference #nlp #pytorch #transformer
https://github.com/huggingface/text-generation-inference
https://github.com/huggingface/text-generation-inference
GitHub
GitHub - huggingface/text-generation-inference: Large Language Model Text Generation Inference
Large Language Model Text Generation Inference. Contribute to huggingface/text-generation-inference development by creating an account on GitHub.
#python #graphcore #habana #inference #intel #onnx #onnxruntime #optimization #pytorch #quantization #tflite #training #transformers
https://github.com/huggingface/optimum
https://github.com/huggingface/optimum
GitHub
GitHub - huggingface/optimum: 🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers…
🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools - huggingface/optimum
#jupyter_notebook #ai #aihub #argo #automl #gpt #inference #kubeflow #kubernetes #llmops #mlops #notebook #pipeline #pytorch #spark #vgpu #workflow
https://github.com/tencentmusic/cube-studio
https://github.com/tencentmusic/cube-studio
GitHub
GitHub - tencentmusic/cube-studio: cube studio开源云原生一站式机器学习/深度学习/大模型AI平台,mlops算法链路全流程,算力租赁平台,notebook在线开发,拖拉拽任务流pipeline编排,多机多卡…
cube studio开源云原生一站式机器学习/深度学习/大模型AI平台,mlops算法链路全流程,算力租赁平台,notebook在线开发,拖拉拽任务流pipeline编排,多机多卡分布式训练,超参搜索,推理服务VGPU虚拟化,边缘计算,标注平台自动化标注,deepseek等大模型sft微调/奖励模型/强化学习训练,vllm/ollama/mindie大模型多机推理,私有知识库,AI模型市场...
👍3
#python #billion_parameters #compression #data_parallelism #deep_learning #gpu #inference #machine_learning #mixture_of_experts #model_parallelism #pipeline_parallelism #pytorch #trillion_parameters #zero
DeepSpeed is a powerful tool for training and using large artificial intelligence models quickly and efficiently. It allows you to train models with billions or even trillions of parameters, which is much faster and cheaper than other methods. With DeepSpeed, you can achieve significant speedups, reduce costs, and improve the performance of your models. For example, it can train ChatGPT-like models 15 times faster than current state-of-the-art systems. This makes it easier to work with large language models without needing massive resources, making AI more accessible and efficient for everyone.
https://github.com/microsoft/DeepSpeed
DeepSpeed is a powerful tool for training and using large artificial intelligence models quickly and efficiently. It allows you to train models with billions or even trillions of parameters, which is much faster and cheaper than other methods. With DeepSpeed, you can achieve significant speedups, reduce costs, and improve the performance of your models. For example, it can train ChatGPT-like models 15 times faster than current state-of-the-art systems. This makes it easier to work with large language models without needing massive resources, making AI more accessible and efficient for everyone.
https://github.com/microsoft/DeepSpeed
GitHub
GitHub - deepspeedai/DeepSpeed: DeepSpeed is a deep learning optimization library that makes distributed training and inference…
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective. - deepspeedai/DeepSpeed
#swift #inference #ios #macos #pretrained_models #speech_recognition #swift #transformers #visionos #watchos #whisper
WhisperKit is a tool that helps your Apple devices recognize speech from audio files or live recordings using OpenAI's Whisper model. It works locally on your device, which means it doesn't need internet connection once set up. To use it, you can add WhisperKit to your Swift project easily through the Swift Package Manager or install a command-line version using Homebrew. This tool is beneficial because it allows you to transcribe audio quickly and efficiently right on your device, making it useful for various applications like voice assistants or transcription services.
https://github.com/argmaxinc/WhisperKit
WhisperKit is a tool that helps your Apple devices recognize speech from audio files or live recordings using OpenAI's Whisper model. It works locally on your device, which means it doesn't need internet connection once set up. To use it, you can add WhisperKit to your Swift project easily through the Swift Package Manager or install a command-line version using Homebrew. This tool is beneficial because it allows you to transcribe audio quickly and efficiently right on your device, making it useful for various applications like voice assistants or transcription services.
https://github.com/argmaxinc/WhisperKit
GitHub
GitHub - argmaxinc/WhisperKit: On-device Speech Recognition for Apple Silicon
On-device Speech Recognition for Apple Silicon. Contribute to argmaxinc/WhisperKit development by creating an account on GitHub.
#cplusplus #android #audio_processing #c_plus_plus #calculator #computer_vision #deep_learning #framework #graph_based #graph_framework #inference #machine_learning #mediapipe #mobile_development #perception #pipeline_framework #stream_processing #video_processing
MediaPipe is a tool that helps you add smart machine learning features to your apps and devices. It works on mobile, web, desktop, and other devices. You can use pre-made solutions for tasks like vision, text, and audio processing, or customize the models to fit your needs. MediaPipe also offers tools like Model Maker and Studio to help you create and test your solutions easily. This makes it easier to delight your customers with innovative features without needing deep machine learning expertise.
https://github.com/google-ai-edge/mediapipe
MediaPipe is a tool that helps you add smart machine learning features to your apps and devices. It works on mobile, web, desktop, and other devices. You can use pre-made solutions for tasks like vision, text, and audio processing, or customize the models to fit your needs. MediaPipe also offers tools like Model Maker and Studio to help you create and test your solutions easily. This makes it easier to delight your customers with innovative features without needing deep machine learning expertise.
https://github.com/google-ai-edge/mediapipe
GitHub
GitHub - google-ai-edge/mediapipe: Cross-platform, customizable ML solutions for live and streaming media.
Cross-platform, customizable ML solutions for live and streaming media. - google-ai-edge/mediapipe
#jupyter_notebook #aws #data_science #deep_learning #examples #inference #jupyter_notebook #machine_learning #mlops #reinforcement_learning #sagemaker #training
SageMaker-Core is a new Python SDK for Amazon SageMaker that makes it easier to work with machine learning resources. It provides an object-oriented interface, which means you can manage resources like training jobs, models, and endpoints more intuitively. The SDK simplifies code by allowing resource chaining, eliminating the need to manually specify parameters. It also includes features like auto code completion, comprehensive documentation, and type hints, making it faster and less error-prone to write code. This helps developers customize their ML workloads more efficiently and streamline their development process.
https://github.com/aws/amazon-sagemaker-examples
SageMaker-Core is a new Python SDK for Amazon SageMaker that makes it easier to work with machine learning resources. It provides an object-oriented interface, which means you can manage resources like training jobs, models, and endpoints more intuitively. The SDK simplifies code by allowing resource chaining, eliminating the need to manually specify parameters. It also includes features like auto code completion, comprehensive documentation, and type hints, making it faster and less error-prone to write code. This helps developers customize their ML workloads more efficiently and streamline their development process.
https://github.com/aws/amazon-sagemaker-examples
GitHub
GitHub - aws/amazon-sagemaker-examples: Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning…
Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker. - GitHub - aws/amazon-sagemaker-examples: Example 📓 Jupyter notebooks...
#python #amd #cuda #gpt #inference #inferentia #llama #llm #llm_serving #llmops #mlops #model_serving #pytorch #rocm #tpu #trainium #transformer #xpu
vLLM is a library that makes it easy, fast, and cheap to use large language models (LLMs). It is designed to be fast with features like efficient memory management, continuous batching, and optimized CUDA kernels. vLLM supports many popular models and can run on various hardware including NVIDIA GPUs, AMD CPUs and GPUs, and more. It also offers seamless integration with Hugging Face models and supports different decoding algorithms. This makes it flexible and easy to use for anyone needing to serve LLMs, whether for research or other applications. You can install vLLM easily with `pip install vllm` and find detailed documentation on their website.
https://github.com/vllm-project/vllm
vLLM is a library that makes it easy, fast, and cheap to use large language models (LLMs). It is designed to be fast with features like efficient memory management, continuous batching, and optimized CUDA kernels. vLLM supports many popular models and can run on various hardware including NVIDIA GPUs, AMD CPUs and GPUs, and more. It also offers seamless integration with Hugging Face models and supports different decoding algorithms. This makes it flexible and easy to use for anyone needing to serve LLMs, whether for research or other applications. You can install vLLM easily with `pip install vllm` and find detailed documentation on their website.
https://github.com/vllm-project/vllm
GitHub
GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs
A high-throughput and memory-efficient inference and serving engine for LLMs - vllm-project/vllm
❤1
#cplusplus #caffe #convolution #deep_learning #deep_neural_networks #diy #graph_algorithms #inference #inference_engine #maxpooling #ncnn #pnnx #pytorch #relu #resnet #sigmoid #yolo #yolov5
This course, "_动手自制大模型推理框架_" (Handcrafting Large Model Inference Framework), is a valuable resource for those interested in deep learning and model inference. It teaches you how to build a modern C++ project from scratch, focusing on designing and implementing a deep learning inference framework. The course supports latest models like LLama3.2 and Qwen2.5, and uses CUDA acceleration and Int8 quantization for better performance.
By taking this course, you will learn how to write efficient C++ code, manage projects with CMake and Git, design computational graphs, implement common operators like convolution and pooling, and optimize them for speed. This knowledge will be highly beneficial for job interviews and advancing your skills in deep learning. The course also includes practical demos on models like Unet and YoloV5, making it a hands-on learning experience.
https://github.com/zjhellofss/KuiperInfer
This course, "_动手自制大模型推理框架_" (Handcrafting Large Model Inference Framework), is a valuable resource for those interested in deep learning and model inference. It teaches you how to build a modern C++ project from scratch, focusing on designing and implementing a deep learning inference framework. The course supports latest models like LLama3.2 and Qwen2.5, and uses CUDA acceleration and Int8 quantization for better performance.
By taking this course, you will learn how to write efficient C++ code, manage projects with CMake and Git, design computational graphs, implement common operators like convolution and pooling, and optimize them for speed. This knowledge will be highly beneficial for job interviews and advancing your skills in deep learning. The course also includes practical demos on models like Unet and YoloV5, making it a hands-on learning experience.
https://github.com/zjhellofss/KuiperInfer
GitHub
GitHub - zjhellofss/KuiperInfer: 校招、秋招、春招、实习好项目!带你从零实现一个高性能的深度学习推理库,支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance…
校招、秋招、春招、实习好项目!带你从零实现一个高性能的深度学习推理库,支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step - zjhellofss/KuiperInfer
#shell #ai #containers #inference_server #llamacpp #llm #podman #vllm
RamaLama is a tool that makes working with AI models easy by using containers. It checks your system for GPU support and uses CPU if no GPU is found. RamaLama uses container engines like Podman or Docker to run AI models, so you don't need to configure your system. You can pull and run AI models from various registries with simple commands, and it supports multiple types of hardware including CPUs and GPUs. This makes it convenient for users as they don't have to set up complex environments, and they can interact with different models easily.
https://github.com/containers/ramalama
RamaLama is a tool that makes working with AI models easy by using containers. It checks your system for GPU support and uses CPU if no GPU is found. RamaLama uses container engines like Podman or Docker to run AI models, so you don't need to configure your system. You can pull and run AI models from various registries with simple commands, and it supports multiple types of hardware including CPUs and GPUs. This makes it convenient for users as they don't have to set up complex environments, and they can interact with different models easily.
https://github.com/containers/ramalama
GitHub
GitHub - containers/ramalama: RamaLama is an open-source developer tool that simplifies the local serving of AI models from any…
RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of ...
#python #cuda #deepseek #deepseek_llm #deepseek_v3 #inference #llama #llama2 #llama3 #llama3_1 #llava #llm #llm_serving #moe #pytorch #transformer #vlm
SGLang is a tool that makes working with large language models and vision language models much faster and more manageable. It has a fast backend runtime that optimizes model performance with features like prefix caching, continuous batching, and quantization. The frontend language is flexible and easy to use, allowing for complex tasks like chained generation calls and multi-modal inputs. SGLang supports many different models and has an active community behind it. This means you can get your models running quickly and efficiently, saving time and resources. Additionally, the extensive documentation and community support make it easier to get started and resolve any issues.
https://github.com/sgl-project/sglang
SGLang is a tool that makes working with large language models and vision language models much faster and more manageable. It has a fast backend runtime that optimizes model performance with features like prefix caching, continuous batching, and quantization. The frontend language is flexible and easy to use, allowing for complex tasks like chained generation calls and multi-modal inputs. SGLang supports many different models and has an active community behind it. This means you can get your models running quickly and efficiently, saving time and resources. Additionally, the extensive documentation and community support make it easier to get started and resolve any issues.
https://github.com/sgl-project/sglang
GitHub
GitHub - sgl-project/sglang: SGLang is a fast serving framework for large language models and vision language models.
SGLang is a fast serving framework for large language models and vision language models. - sgl-project/sglang
#c_lang #convolutional_neural_network #convolutional_neural_networks #cpu #inference #inference_optimization #matrix_multiplication #mobile_inference #multithreading #neural_network #neural_networks #simd
XNNPACK is a powerful tool that helps make neural networks run faster on various devices like smartphones, computers, and Raspberry Pi boards. It supports many different types of processors and operating systems, making it very versatile. XNNPACK doesn't work directly with users but instead helps other machine learning frameworks like TensorFlow Lite, PyTorch, and ONNX Runtime to perform better. This means your apps and programs that use these frameworks can run neural networks more quickly and efficiently, which is beneficial because it saves time and improves performance.
https://github.com/google/XNNPACK
XNNPACK is a powerful tool that helps make neural networks run faster on various devices like smartphones, computers, and Raspberry Pi boards. It supports many different types of processors and operating systems, making it very versatile. XNNPACK doesn't work directly with users but instead helps other machine learning frameworks like TensorFlow Lite, PyTorch, and ONNX Runtime to perform better. This means your apps and programs that use these frameworks can run neural networks more quickly and efficiently, which is beneficial because it saves time and improves performance.
https://github.com/google/XNNPACK
GitHub
GitHub - google/XNNPACK: High-efficiency floating-point neural network inference operators for mobile, server, and Web
High-efficiency floating-point neural network inference operators for mobile, server, and Web - google/XNNPACK