GitHub Trends

#python #amd #cuda #gpt #inference #inferentia #llama #llm #llm_serving #llmops #mlops #model_serving #pytorch #rocm #tpu #trainium #transformer #xpu

vLLM is a library that makes it easy, fast, and cheap to use large language models (LLMs). It is designed to be fast with features like efficient memory management, continuous batching, and optimized CUDA kernels. vLLM supports many popular models and can run on various hardware including NVIDIA GPUs, AMD CPUs and GPUs, and more. It also offers seamless integration with Hugging Face models and supports different decoding algorithms. This makes it flexible and easy to use for anyone needing to serve LLMs, whether for research or other applications. You can install vLLM easily with `pip install vllm` and find detailed documentation on their website.

https://github.com/vllm-project/vllm

GitHub

GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs

A high-throughput and memory-efficient inference and serving engine for LLMs - vllm-project/vllm

❤1

373 views13:00

GitHub Trends

#python #audio_generation #diffusion #image_generation #inference #model_serving #multimodal #pytorch #transformer #video_generation

vLLM-Omni is a free, open-source tool that makes serving AI models for text, images, videos, and audio fast, easy, and cheap. It builds on vLLM for top speed using smart memory tricks, overlapping tasks, and flexible resource sharing across GPUs. You get 2x higher throughput, 35% less delay, and simple setup with Hugging Face models via OpenAI API—perfect for building quick multi-modal apps like chatbots or media generators without high costs.

https://github.com/vllm-project/vllm-omni

GitHub

GitHub - vllm-project/vllm-omni: A framework for efficient model inference with omni-modality models

A framework for efficient model inference with omni-modality models - vllm-project/vllm-omni

265 views15:30

About

Blog

Apps

Platform