GitHub Trends

642 views11:02

#python #beit #beit_3 #bitnet #deepnet #document_ai #foundation_models #kosmos #kosmos_1 #layoutlm #layoutxlm #llm #minilm #mllm #multimodal #nlp #pre_trained_model #textdiffuser #trocr #unilm #xlm_e

Microsoft is developing advanced AI models through large-scale self-supervised pre-training across various tasks, languages, and modalities. These models, such as Foundation Transformers (Magneto) and Kosmos-2.5, are designed to be highly generalizable and capable of handling multiple tasks like language understanding, vision, speech, and multimodal interactions. The benefit to users includes state-of-the-art performance in document AI, speech recognition, machine translation, and more, making these models highly versatile and efficient for a wide range of applications. Additionally, tools like TorchScale and Aggressive Decoding enhance stability, efficiency, and speed in model training and deployment.

https://github.com/microsoft/unilm

GitHub

GitHub - microsoft/unilm: Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities - microsoft/unilm

310 views14:15

GitHub Trends

#python #foundation_models #vision_language_model #vision_language_pretraining

DeepSeek-VL is a powerful, open-source Vision-Language (VL) Model that helps you understand and interact with both images and text. It can process various types of data like logical diagrams, web pages, scientific literature, and natural images. You can use it for different applications, such as describing images, recognizing formulas, and more. The model is available in different sizes and variants, making it flexible for various needs. You can download and use the models freely, even for commercial purposes, under the specified licenses. This tool makes it easier to integrate vision and language understanding into your projects.

https://github.com/deepseek-ai/DeepSeek-VL

GitHub

GitHub - deepseek-ai/DeepSeek-VL: DeepSeek-VL: Towards Real-World Vision-Language Understanding

DeepSeek-VL: Towards Real-World Vision-Language Understanding - deepseek-ai/DeepSeek-VL

👍1

392 views13:30

GitHub Trends

#python #any_to_any #foundation_models #llm #multimodal #unified_model #vision_language_pretraining

The Janus-Series models, including Janus, Janus-Pro, and JanusFlow, are advanced AI tools that combine multimodal understanding and generation capabilities. These models can process both text and images, allowing for tasks like answering questions based on images and generating images from text descriptions. Janus-Pro is an improved version with better performance due to optimized training strategies and larger model sizes. JanusFlow integrates autoregressive language models with rectified flow for efficient image generation. The benefit to the user is the ability to perform complex multimodal tasks with high accuracy and flexibility, making these models useful for a wide range of applications in research and industry.

https://github.com/deepseek-ai/Janus

GitHub

GitHub - deepseek-ai/Janus: Janus-Series: Unified Multimodal Understanding and Generation Models

Janus-Series: Unified Multimodal Understanding and Generation Models - deepseek-ai/Janus

❤1

445 views14:30

GitHub Trends

#python #ai #big_model #data_parallelism #deep_learning #distributed_computing #foundation_models #heterogeneous_training #hpc #inference #large_scale #model_parallelism #pipeline_parallelism

Colossal-AI is a powerful tool that helps make large AI models faster, cheaper, and easier to use. It uses special techniques like parallelism to speed up training on big models without needing expensive hardware. This means users can train complex AI models even on regular computers or laptops, saving time and money. Colossal-AI also supports various applications across industries like medicine, video generation, and chatbots, making it very versatile for developers.

https://github.com/hpcaitech/ColossalAI

GitHub

GitHub - hpcaitech/ColossalAI: Making large AI models cheaper, faster and more accessible

Making large AI models cheaper, faster and more accessible - hpcaitech/ColossalAI

544 views00:00

About

Blog

Apps

Platform