GitHub Trends

#python #beit #beit_3 #bitnet #deepnet #document_ai #foundation_models #kosmos #kosmos_1 #layoutlm #layoutxlm #llm #minilm #mllm #multimodal #nlp #pre_trained_model #textdiffuser #trocr #unilm #xlm_e

Microsoft is developing advanced AI models through large-scale self-supervised pre-training across various tasks, languages, and modalities. These models, such as Foundation Transformers (Magneto) and Kosmos-2.5, are designed to be highly generalizable and capable of handling multiple tasks like language understanding, vision, speech, and multimodal interactions. The benefit to users includes state-of-the-art performance in document AI, speech recognition, machine translation, and more, making these models highly versatile and efficient for a wide range of applications. Additionally, tools like TorchScale and Aggressive Decoding enhance stability, efficiency, and speed in model training and deployment.

https://github.com/microsoft/unilm

GitHub

GitHub - microsoft/unilm: Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities - microsoft/unilm

310 views14:15

GitHub Trends

#python #agent_computer_interface #ai_agents #computer_automation #computer_use #grounding #gui_agents #in_context_reinforcement_learning #memory #mllm #planning #retrieval_augmented_generation

Agent S2 is a smart AI assistant that handles computer tasks by breaking them into smaller steps and using specialized tools for each part, making it highly adaptable and efficient across different systems like Windows and Android. It outperforms other AI tools in completing complex tasks, learns from experience, and adjusts plans as needed, helping users automate digital work more reliably and effectively.

https://github.com/simular-ai/Agent-S

GitHub

GitHub - simular-ai/Agent-S: Agent S: an open agentic framework that uses computers like a human

Agent S: an open agentic framework that uses computers like a human - simular-ai/Agent-S

530 views13:00

GitHub Trends

#python #mllm #point_clouds #scene_understanding #spatial_intelligence

SpatialLM is a powerful 3D language model that turns complex 3D point cloud data from videos, RGBD images, or LiDAR into clear, structured 3D scene layouts showing walls, doors, windows, and objects with labels. It works without needing special equipment and can detect user-specified object categories. This helps you understand and analyze indoor spaces better, useful for robotics, navigation, and 3D design. You can run it on your data, visualize results, and even customize detection tasks easily, making 3D scene understanding more accessible and flexible for many applications.

https://github.com/manycore-research/SpatialLM

GitHub

GitHub - manycore-research/SpatialLM: [NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling

[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling - manycore-research/SpatialLM

396 views12:30

About

Blog

Apps

Platform