GitHub Trends

#python #augmix #convnext #distributed_training #dual_path_networks #efficientnet #image_classification #imagenet #maxvit #mixnet #mobile_deep_learning #mobilenet_v2 #mobilenetv3 #nfnets #normalization_free_training #pretrained_models #pretrained_weights #pytorch #randaugment #resnet #vision_transformer_models

PyTorch Image Models (`timm`) is a comprehensive library that includes a wide range of state-of-the-art image models, layers, utilities, optimizers, and training scripts. Here are the key benefits `timm` offers over 300 pre-trained models from various families like Vision Transformers, ResNets, EfficientNets, and more, allowing you to choose the best model for your task.
- **Pre-trained Weights** You can easily extract features at different levels of the network using `features_only=True` and `out_indices`, making it versatile for various applications.
- **Optimizers and Schedulers** It provides several augmentation techniques like AutoAugment, RandAugment, and regularization methods like DropPath and DropBlock to enhance model performance.
- **Reference Training Scripts**: Included are high-performance training, validation, and inference scripts that support multiple GPUs and mixed-precision training.

Overall, `timm` simplifies the process of working with deep learning models for image tasks by providing a unified interface and extensive tools for training and evaluation.

https://github.com/huggingface/pytorch-image-models

GitHub

GitHub - huggingface/pytorch-image-models: The largest collection of PyTorch image encoders / backbones. Including train, eval…

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V...

292 views13:54

GitHub Trends

#python #chinese #clip #computer_vision #contrastive_loss #coreml_models #deep_learning #image_text_retrieval #multi_modal #multi_modal_learning #nlp #pretrained_models #pytorch #transformers #vision_and_language_pre_training #vision_language

This project is about a Chinese version of the CLIP (Contrastive Language-Image Pretraining) model, trained on a large dataset of Chinese text and images. Here’s what you need to know This model helps you quickly perform tasks like calculating text and image features, cross-modal retrieval (finding images based on text or vice versa), and zero-shot image classification (classifying images without any labeled examples).
- **Ease of Use** The model has been tested on various datasets and shows strong performance in zero-shot image classification and cross-modal retrieval tasks.
- **Resources**: The project includes pre-trained models, training and testing codes, and detailed tutorials on how to use the model for different tasks.

Overall, this project makes it easy to work with Chinese text and images using advanced AI techniques, saving you time and effort.

https://github.com/OFA-Sys/Chinese-CLIP

GitHub

GitHub - OFA-Sys/Chinese-CLIP: Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation. - OFA-Sys/Chinese-CLIP

357 views14:30

GitHub Trends

#swift #ai #android #barcode #camera #instagram #ios #javascript #jsi #library #native #qr #qrcode #react #react_native #react_native_camera #scanner #snapchat #typescript #vision #worklet

VisionCamera is a powerful camera library for React Native that offers many useful features. You can capture photos and videos, scan QR codes and barcodes, use multiple cameras, and adjust resolutions and frame rates. It also supports advanced features like facial recognition, object detection, and real-time video chats through frame processors. Additionally, you can draw shapes, text, and filters on the camera view, and it includes smooth zooming, fast pause and resume, HDR and night modes, and a custom video pipeline. Installing it is easy with npm, and there are detailed guides and examples to help you get started. Using VisionCamera can enhance your app's camera capabilities significantly.

https://github.com/mrousavy/react-native-vision-camera

GitHub

GitHub - mrousavy/react-native-vision-camera: 📸 A powerful, high-performance React Native Camera library.

📸 A powerful, high-performance React Native Camera library. - mrousavy/react-native-vision-camera

❤1👍1

487 views20:00

GitHub Trends

#python #api #automation #browser #browser_automation #computer #gpt #llm #playwright #python #rpa #vision #workflow

Skyvern is a tool that automates browser-based workflows using Large Language Models (LLMs) and computer vision. It can interact with websites without needing custom scripts, making it resistant to website layout changes. Here’s how it benefits you Skyvern can handle tasks on websites it has never seen before, filling out forms, extracting data, and even handling 2FA authentication.
- **Flexibility** Unlike traditional automation methods, Skyvern is less likely to break when website layouts change.
- **Ease of Use**: You can create tasks and workflows through a simple API or a user-friendly UI, without needing to write complex code.

Overall, Skyvern simplifies and stabilizes the automation of web-based tasks, making it easier to manage and scale your workflows.

https://github.com/Skyvern-AI/skyvern

GitHub

GitHub - Skyvern-AI/skyvern: Automate browser based workflows with AI

Automate browser based workflows with AI. Contribute to Skyvern-AI/skyvern development by creating an account on GitHub.

329 views17:00

GitHub Trends

#rust #ai #computer_vision #llm #machine_learning #ml #multimodal #vision

ScreenPipe is an AI assistant that records your screen and voice 24/7, giving you all the context you need. It's like having a personal recorder that helps you remember everything. You can use it as a desktop app, command line tool, or even integrate it into other applications. The benefit is that you'll never miss important details again, and you can prepare for the future where data is crucial. Plus, it's open-source, so you can customize it to your needs. Downloading ScreenPipe can help you stay organized and prepared in the age of super intelligence.

https://github.com/mediar-ai/screenpipe

GitHub

GitHub - mediar-ai/screenpipe: AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen…

AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording - mediar-ai/screenpipe

❤1

437 views13:00

GitHub Trends

#python #auto_regressive_model #autoregressive_models #diffusion_models #generative_ai #generative_model #gpt #gpt_2 #image_generation #large_language_models #neurips #transformers #vision_transformer

VAR (Visual Autoregressive Modeling) is a new way to generate images that improves upon existing methods. It uses a "next-scale prediction" approach, which means it generates images from coarse to fine details, unlike the traditional method of predicting pixel by pixel. This makes VAR models better than diffusion models for the first time. You can try VAR on a demo website and generate images interactively, which is fun and easy. VAR also follows power-law scaling laws, making it efficient and scalable. The benefit to you is that you can create high-quality images quickly and easily, and even explore technical details through provided scripts and models.

https://github.com/FoundationVision/VAR

GitHub

GitHub - FoundationVision/VAR: [NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official…

[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Predi...

👍1😁1

411 views13:30

GitHub Trends

#python #agent #ai #asr #cpp #gemini #golang #gpt_4 #gpt_4o #llm #low_latency #multimodal #nextjs14 #openai #python #rag #real_time #realtime #tts #vision #voice_assistant

The TEN Agent is a powerful tool that helps you create and manage AI agents with various capabilities like real-time vision, screen detection, and integration with services like Google Gemini Multimodal Live API, Weather Check, and Web Search. To use it, you need to set up your environment with Docker, Node.js, and specific API keys. You can follow simple steps to configure and start your agent locally. The benefits include easy integration of advanced AI features, a supportive community through Discord and GitHub discussions, and the ability to customize and extend your agents with ready-to-use extensions. This makes it easier to develop and deploy sophisticated AI applications quickly.

https://github.com/TEN-framework/TEN-Agent

GitHub

GitHub - TEN-framework/ten-framework: Open-source framework for conversational voice AI agents

Open-source framework for conversational voice AI agents - TEN-framework/ten-framework

470 views12:00

GitHub Trends

#python #foundation_models #vision_language_model #vision_language_pretraining

DeepSeek-VL is a powerful, open-source Vision-Language (VL) Model that helps you understand and interact with both images and text. It can process various types of data like logical diagrams, web pages, scientific literature, and natural images. You can use it for different applications, such as describing images, recognizing formulas, and more. The model is available in different sizes and variants, making it flexible for various needs. You can download and use the models freely, even for commercial purposes, under the specified licenses. This tool makes it easier to integrate vision and language understanding into your projects.

https://github.com/deepseek-ai/DeepSeek-VL

GitHub

GitHub - deepseek-ai/DeepSeek-VL: DeepSeek-VL: Towards Real-World Vision-Language Understanding

DeepSeek-VL: Towards Real-World Vision-Language Understanding - deepseek-ai/DeepSeek-VL

👍1

392 views13:30

GitHub Trends

#python #any_to_any #foundation_models #llm #multimodal #unified_model #vision_language_pretraining

The Janus-Series models, including Janus, Janus-Pro, and JanusFlow, are advanced AI tools that combine multimodal understanding and generation capabilities. These models can process both text and images, allowing for tasks like answering questions based on images and generating images from text descriptions. Janus-Pro is an improved version with better performance due to optimized training strategies and larger model sizes. JanusFlow integrates autoregressive language models with rectified flow for efficient image generation. The benefit to the user is the ability to perform complex multimodal tasks with high accuracy and flexibility, making these models useful for a wide range of applications in research and industry.

https://github.com/deepseek-ai/Janus

GitHub

GitHub - deepseek-ai/Janus: Janus-Series: Unified Multimodal Understanding and Generation Models

Janus-Series: Unified Multimodal Understanding and Generation Models - deepseek-ai/Janus

❤1

445 views14:30

GitHub Trends

#typescript #agent #browser_use #computer_use #electron #gui_agents #mcp #mcp_server #vision #vite #vlm

Agent TARS is a powerful tool that helps automate tasks using AI. It integrates with many tools and can handle complex tasks like web scraping and data analysis. This makes it easier to manage workflows and reduces errors. Users can automate tasks in just a few steps, making it very efficient. Agent TARS also supports advanced browser operations and has a user-friendly desktop app, which makes it easy to use for anyone. Overall, it helps users save time and work more efficiently.

https://github.com/bytedance/UI-TARS-desktop

GitHub

GitHub - bytedance/UI-TARS-desktop: The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra - bytedance/UI-TARS-desktop

512 views13:00

GitHub Trends

#go #anticensorship #dns #network #proxy #reality #shadowsocks #socks5 #tls #trojan #tunnel #utls #vision #vless #vmess #vpn #wireguard #xhttp #xray #xtls #xudp

Project X offers powerful network tools like Xray-core and REALITY, built on the efficient XTLS protocol that improves speed and security by reducing unnecessary encryption. It features advanced routing and fallback systems to keep your internet traffic safe and uninterrupted, ideal for streaming or video calls. The project is open-source under Mozilla Public License 2.0, encouraging community contributions to keep it evolving. You can easily install it on various platforms using official scripts, Docker, or one-click setups, and use many supported GUI clients on Windows, Linux, Android, iOS, and routers. This flexibility and strong security help you optimize and protect your network experience.

https://github.com/XTLS/Xray-core

GitHub

GitHub - XTLS/Xray-core: Xray, Penetrates Everything. Also the best v2ray-core. Where the magic happens. An open platform for various…

Xray, Penetrates Everything. Also the best v2ray-core. Where the magic happens. An open platform for various uses. - XTLS/Xray-core

🔥1

491 views13:00

GitHub Trends

#typescript #ai #anthropic #artifacts #assistant_api #aws #azure #chatgpt #chatgpt_clone #claude #clone #dall_e_3 #deepseek #gemini #google #librechat #o1 #openai #plugins #vision #webui

LibreChat is a free, open-source AI chatbot platform that lets you use many AI models like OpenAI, Anthropic, and AWS in one place. It offers advanced features such as secure code execution in multiple programming languages, AI assistants that can handle files and tools without coding, and the ability to generate images and diagrams directly in chat. You can search conversations easily, manage multiple chat threads, and customize the interface to fit your needs. LibreChat supports multiple languages, speech input/output, and secure multi-user access. It can be deployed locally or on the cloud, giving you flexibility and control over your AI experience. This means you get a powerful, customizable AI assistant without needing to pay for ChatGPT Plus or rely on a single provider[1][3][5].

https://github.com/danny-avila/LibreChat

GitHub

GitHub - danny-avila/LibreChat: Enhanced ChatGPT Clone: Features Agents, MCP, DeepSeek, Anthropic, AWS, OpenAI, Responses API,…

Enhanced ChatGPT Clone: Features Agents, MCP, DeepSeek, Anthropic, AWS, OpenAI, Responses API, Azure, Groq, o1, GPT-5, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message...

340 views13:00

GitHub Trends

#python #agent #context_engineering #electron #embedding_models #memory #proactive_ai #python #python3 #rag #react #vector_database #vision_language_model

MineContext is a special AI tool that helps you work more efficiently. It collects information from your computer screen and other sources, then uses this data to give you useful insights, summaries, and reminders. This helps you stay organized and focused on important tasks. MineContext is also very private because it stores all your data on your local device, not in the cloud. It's like having a personal assistant that helps you manage your digital life better.

https://github.com/volcengine/MineContext

GitHub

GitHub - volcengine/MineContext: MineContext is your proactive context-aware AI partner（Context-Engineering+ChatGPT Pulse）

MineContext is your proactive context-aware AI partner（Context-Engineering+ChatGPT Pulse） - volcengine/MineContext

448 views13:30

About

Blog

Apps

Platform