#python #cuda #deepseek #deepseek_llm #deepseek_v3 #inference #llama #llama2 #llama3 #llama3_1 #llava #llm #llm_serving #moe #pytorch #transformer #vlm
SGLang is a tool that makes working with large language models and vision language models much faster and more manageable. It has a fast backend runtime that optimizes model performance with features like prefix caching, continuous batching, and quantization. The frontend language is flexible and easy to use, allowing for complex tasks like chained generation calls and multi-modal inputs. SGLang supports many different models and has an active community behind it. This means you can get your models running quickly and efficiently, saving time and resources. Additionally, the extensive documentation and community support make it easier to get started and resolve any issues.
https://github.com/sgl-project/sglang
SGLang is a tool that makes working with large language models and vision language models much faster and more manageable. It has a fast backend runtime that optimizes model performance with features like prefix caching, continuous batching, and quantization. The frontend language is flexible and easy to use, allowing for complex tasks like chained generation calls and multi-modal inputs. SGLang supports many different models and has an active community behind it. This means you can get your models running quickly and efficiently, saving time and resources. Additionally, the extensive documentation and community support make it easier to get started and resolve any issues.
https://github.com/sgl-project/sglang
GitHub
GitHub - sgl-project/sglang: SGLang is a fast serving framework for large language models and vision language models.
SGLang is a fast serving framework for large language models and vision language models. - sgl-project/sglang
#python #llm #multimodal_large_language_models #svg #vlm
StarVector is a powerful tool that converts images into Scalable Vector Graphics (SVG) code. It uses a special kind of AI called a multimodal vision-language model to understand both images and text. This means it can create SVGs from pictures or text instructions. The benefit is that SVGs are scalable and editable, making them perfect for web design and graphic art. StarVector is especially good at vectorizing icons, logos, and diagrams, producing high-quality results that are easy to edit and resize without losing clarity[1][3][5].
https://github.com/joanrod/star-vector
StarVector is a powerful tool that converts images into Scalable Vector Graphics (SVG) code. It uses a special kind of AI called a multimodal vision-language model to understand both images and text. This means it can create SVGs from pictures or text instructions. The benefit is that SVGs are scalable and editable, making them perfect for web design and graphic art. StarVector is especially good at vectorizing icons, logos, and diagrams, producing high-quality results that are easy to edit and resize without losing clarity[1][3][5].
https://github.com/joanrod/star-vector
GitHub
GitHub - joanrod/star-vector: StarVector is a foundation model for SVG generation that transforms vectorization into a code generation…
StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and te...
#typescript #agent #browser_use #computer_use #electron #gui_agents #mcp #mcp_server #vision #vite #vlm
Agent TARS is a powerful tool that helps automate tasks using AI. It integrates with many tools and can handle complex tasks like web scraping and data analysis. This makes it easier to manage workflows and reduces errors. Users can automate tasks in just a few steps, making it very efficient. Agent TARS also supports advanced browser operations and has a user-friendly desktop app, which makes it easy to use for anyone. Overall, it helps users save time and work more efficiently.
https://github.com/bytedance/UI-TARS-desktop
Agent TARS is a powerful tool that helps automate tasks using AI. It integrates with many tools and can handle complex tasks like web scraping and data analysis. This makes it easier to manage workflows and reduces errors. Users can automate tasks in just a few steps, making it very efficient. Agent TARS also supports advanced browser operations and has a user-friendly desktop app, which makes it easy to use for anyone. Overall, it helps users save time and work more efficiently.
https://github.com/bytedance/UI-TARS-desktop
GitHub
GitHub - bytedance/UI-TARS-desktop: The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra
The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra - bytedance/UI-TARS-desktop
#python #ernie #ernie_45 #ernie_45_vl #erniekit #llm #vlm
ERNIE 4.5 is a powerful AI model family that understands and generates text, images, and videos together, thanks to its special design that shares knowledge across these types without losing quality. It includes large models with billions of parameters and smaller efficient ones, all trained using the PaddlePaddle framework for fast and effective use. ERNIE 4.5 excels in tasks like language understanding, visual reasoning, and following instructions, often outperforming other top models. It also offers tools for easy training and deployment on various hardware. This means you can use ERNIE 4.5 for advanced AI applications involving text and visuals with high accuracy and efficiency, supported by open-source resources for customization and development[1][3][5].
https://github.com/PaddlePaddle/ERNIE
ERNIE 4.5 is a powerful AI model family that understands and generates text, images, and videos together, thanks to its special design that shares knowledge across these types without losing quality. It includes large models with billions of parameters and smaller efficient ones, all trained using the PaddlePaddle framework for fast and effective use. ERNIE 4.5 excels in tasks like language understanding, visual reasoning, and following instructions, often outperforming other top models. It also offers tools for easy training and deployment on various hardware. This means you can use ERNIE 4.5 for advanced AI applications involving text and visuals with high accuracy and efficiency, supported by open-source resources for customization and development[1][3][5].
https://github.com/PaddlePaddle/ERNIE
GitHub
GitHub - PaddlePaddle/ERNIE: The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit based…
The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit based on PaddlePaddle. - PaddlePaddle/ERNIE
#python #document_analysis #layout_analysis #ocr #parser #pdf #pdf_converter #pdf_parser #python #vlm_ocr
Dolphin is a smart AI tool that can analyze and understand complex document images, like pages with text, tables, formulas, and pictures. It works in two steps: first, it figures out the layout and reading order of the page; then, it quickly parses each element using special prompts. This makes it fast and accurate for turning document images into structured data like JSON or Markdown. You can use pre-trained models and easy code to process single pages, PDFs, or specific elements. This helps you save time and effort when extracting information from complicated documents efficiently.
https://github.com/bytedance/Dolphin
Dolphin is a smart AI tool that can analyze and understand complex document images, like pages with text, tables, formulas, and pictures. It works in two steps: first, it figures out the layout and reading order of the page; then, it quickly parses each element using special prompts. This makes it fast and accurate for turning document images into structured data like JSON or Markdown. You can use pre-trained models and easy code to process single pages, PDFs, or specific elements. This helps you save time and effort when extracting information from complicated documents efficiently.
https://github.com/bytedance/Dolphin
GitHub
GitHub - bytedance/Dolphin: The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025. - bytedance/Dolphin
#go #gemma3 #go #gpt_oss #granite4 #llama #llama3 #llm #on_device_ai #phi3 #qwen3 #qwen3vl #sdk #stable_diffusion #vlm
NexaSDK runs AI models locally on CPUs, GPUs, and NPUs with a single command, supports GGUF/MLX/.nexa formats, and offers NPU-first Android and macOS support for fast, multimodal (text, image, audio) inference, plus an OpenAI‑compatible API for easy integration. This gives you low-latency, private on-device AI across laptops, phones, and embedded systems, reduces cloud costs and data exposure, and lets you deploy and test new models immediately on target hardware for faster development and better user experience.
https://github.com/NexaAI/nexa-sdk
NexaSDK runs AI models locally on CPUs, GPUs, and NPUs with a single command, supports GGUF/MLX/.nexa formats, and offers NPU-first Android and macOS support for fast, multimodal (text, image, audio) inference, plus an OpenAI‑compatible API for easy integration. This gives you low-latency, private on-device AI across laptops, phones, and embedded systems, reduces cloud costs and data exposure, and lets you deploy and test new models immediately on target hardware for faster development and better user experience.
https://github.com/NexaAI/nexa-sdk
GitHub
GitHub - NexaAI/nexa-sdk: Run the latest LLMs and VLMs across GPU, NPU, and CPU with PC (Python/C++) & mobile (Android & iOS) support…
Run the latest LLMs and VLMs across GPU, NPU, and CPU with PC (Python/C++) & mobile (Android & iOS) support, running quickly with OpenAI gpt-oss, Granite4, Qwen3VL, Gemma 3n and mor...