GitHub Trends
10.1K subscribers
15.3K links
See what the GitHub community is most excited about today.

A bot automatically fetches new repositories from https://github.com/trending and sends them to the channel.

Author and maintainer: https://github.com/katursis
Download Telegram
#python #beit #beit_3 #bitnet #deepnet #document_ai #foundation_models #kosmos #kosmos_1 #layoutlm #layoutxlm #llm #minilm #mllm #multimodal #nlp #pre_trained_model #textdiffuser #trocr #unilm #xlm_e

Microsoft is developing advanced AI models through large-scale self-supervised pre-training across various tasks, languages, and modalities. These models, such as Foundation Transformers (Magneto) and Kosmos-2.5, are designed to be highly generalizable and capable of handling multiple tasks like language understanding, vision, speech, and multimodal interactions. The benefit to users includes state-of-the-art performance in document AI, speech recognition, machine translation, and more, making these models highly versatile and efficient for a wide range of applications. Additionally, tools like TorchScale and Aggressive Decoding enhance stability, efficiency, and speed in model training and deployment.

https://github.com/microsoft/unilm
#python #agent_computer_interface #ai_agents #computer_automation #computer_use #grounding #gui_agents #in_context_reinforcement_learning #memory #mllm #planning #retrieval_augmented_generation

Agent S2 is a smart AI assistant that handles computer tasks by breaking them into smaller steps and using specialized tools for each part, making it highly adaptable and efficient across different systems like Windows and Android. It outperforms other AI tools in completing complex tasks, learns from experience, and adjusts plans as needed, helping users automate digital work more reliably and effectively.

https://github.com/simular-ai/Agent-S
#python #mllm #point_clouds #scene_understanding #spatial_intelligence

SpatialLM is a powerful 3D language model that turns complex 3D point cloud data from videos, RGBD images, or LiDAR into clear, structured 3D scene layouts showing walls, doors, windows, and objects with labels. It works without needing special equipment and can detect user-specified object categories. This helps you understand and analyze indoor spaces better, useful for robotics, navigation, and 3D design. You can run it on your data, visualize results, and even customize detection tasks easily, making 3D scene understanding more accessible and flexible for many applications.

https://github.com/manycore-research/SpatialLM