GitHub Trends
10.1K subscribers
15.3K links
See what the GitHub community is most excited about today.

A bot automatically fetches new repositories from https://github.com/trending and sends them to the channel.

Author and maintainer: https://github.com/katursis
Download Telegram
#python #beit #beit_3 #bitnet #deepnet #document_ai #foundation_models #kosmos #kosmos_1 #layoutlm #layoutxlm #llm #minilm #mllm #multimodal #nlp #pre_trained_model #textdiffuser #trocr #unilm #xlm_e

Microsoft is developing advanced AI models through large-scale self-supervised pre-training across various tasks, languages, and modalities. These models, such as Foundation Transformers (Magneto) and Kosmos-2.5, are designed to be highly generalizable and capable of handling multiple tasks like language understanding, vision, speech, and multimodal interactions. The benefit to users includes state-of-the-art performance in document AI, speech recognition, machine translation, and more, making these models highly versatile and efficient for a wide range of applications. Additionally, tools like TorchScale and Aggressive Decoding enhance stability, efficiency, and speed in model training and deployment.

https://github.com/microsoft/unilm
#python #foundation_models #vision_language_model #vision_language_pretraining

DeepSeek-VL is a powerful, open-source Vision-Language (VL) Model that helps you understand and interact with both images and text. It can process various types of data like logical diagrams, web pages, scientific literature, and natural images. You can use it for different applications, such as describing images, recognizing formulas, and more. The model is available in different sizes and variants, making it flexible for various needs. You can download and use the models freely, even for commercial purposes, under the specified licenses. This tool makes it easier to integrate vision and language understanding into your projects.

https://github.com/deepseek-ai/DeepSeek-VL
👍1
#python #any_to_any #foundation_models #llm #multimodal #unified_model #vision_language_pretraining

The Janus-Series models, including Janus, Janus-Pro, and JanusFlow, are advanced AI tools that combine multimodal understanding and generation capabilities. These models can process both text and images, allowing for tasks like answering questions based on images and generating images from text descriptions. Janus-Pro is an improved version with better performance due to optimized training strategies and larger model sizes. JanusFlow integrates autoregressive language models with rectified flow for efficient image generation. The benefit to the user is the ability to perform complex multimodal tasks with high accuracy and flexibility, making these models useful for a wide range of applications in research and industry.

https://github.com/deepseek-ai/Janus
1
#python #ai #big_model #data_parallelism #deep_learning #distributed_computing #foundation_models #heterogeneous_training #hpc #inference #large_scale #model_parallelism #pipeline_parallelism

Colossal-AI is a powerful tool that helps make large AI models faster, cheaper, and easier to use. It uses special techniques like parallelism to speed up training on big models without needing expensive hardware. This means users can train complex AI models even on regular computers or laptops, saving time and money. Colossal-AI also supports various applications across industries like medicine, video generation, and chatbots, making it very versatile for developers.

https://github.com/hpcaitech/ColossalAI