GitHub Trends
10.1K subscribers
15.3K links
See what the GitHub community is most excited about today.

A bot automatically fetches new repositories from https://github.com/trending and sends them to the channel.

Author and maintainer: https://github.com/katursis
Download Telegram
#python #artificial_intelligence #attention_mechanism #computer_vision #image_classification #transformers

This text describes a comprehensive implementation of Vision Transformers (ViT) in PyTorch, offering various models and techniques for image classification. Here’s the key information and benefits**
- The repository provides multiple ViT variants, including the original ViT, Simple ViT, NaViT, Deep ViT, CaiT, Token-to-Token ViT, CCT, Cross ViT, PiT, LeViT, CvT, Twins SVT, RegionViT, CrossFormer, ScalableViT, SepViT, MaxViT, NesT, MobileViT, XCiT, and others.
- Each variant introduces different architectural improvements such as efficient attention mechanisms, multi-scale processing, and innovative embedding techniques.
- The implementation includes pre-trained models and supports various tasks like masked image modeling, distillation, and self-supervised learning.

**Benefits** Users can choose from a wide range of ViT models tailored for different needs, such as efficiency, performance, or specific tasks.
- **Performance** Some models, like NaViT and ScalableViT, are designed to be more efficient in terms of computational resources and training time.
- **Ease of Use** The inclusion of various research ideas and techniques allows users to explore new approaches in vision transformer research.

Overall, this repository offers a powerful toolkit for anyone working with vision transformers, providing both practical solutions and cutting-edge research opportunities.

https://github.com/lucidrains/vit-pytorch
👍1
#python #annotation #annotation_tool #annotations #boundingbox #computer_vision #computer_vision_annotation #dataset #deep_learning #image_annotation #image_classification #image_labeling #image_labelling_tool #imagenet #labeling #labeling_tool #object_detection #pytorch #semantic_segmentation #tensorflow #video_annotation

CVAT is a powerful tool for annotating videos and images, especially useful for computer vision projects. It helps developers and companies annotate data quickly and efficiently. You can use CVAT online for free or subscribe for more features like unlimited data and integrations with other tools. It also offers a self-hosted option with enterprise support. CVAT supports many annotation formats and has automatic labeling options to speed up your work. It's widely used by many teams worldwide, making it a reliable choice for your data annotation needs.

https://github.com/cvat-ai/cvat
#python #auto_regressive_model #autoregressive_models #diffusion_models #generative_ai #generative_model #gpt #gpt_2 #image_generation #large_language_models #neurips #transformers #vision_transformer

VAR (Visual Autoregressive Modeling) is a new way to generate images that improves upon existing methods. It uses a "next-scale prediction" approach, which means it generates images from coarse to fine details, unlike the traditional method of predicting pixel by pixel. This makes VAR models better than diffusion models for the first time. You can try VAR on a demo website and generate images interactively, which is fun and easy. VAR also follows power-law scaling laws, making it efficient and scalable. The benefit to you is that you can create high-quality images quickly and easily, and even explore technical details through provided scripts and models.

https://github.com/FoundationVision/VAR
👍1😁1
#python #3d_creation #3d_generation #aigc #diffusion_models #generative_model #image_to_3d

DreamCraft3D is a method to create highly detailed and realistic 3D objects using a combination of 2D reference images and advanced algorithms. It ensures that the 3D objects look consistent from all angles and have realistic textures. This is achieved by using a special technique called "Bootstrapped Score Distillation" which improves both the shape and texture of the 3D object in a way that reinforces each other. The benefit to the user is that they can generate very realistic 3D models quickly and accurately, which can be useful for various applications such as video games, movies, and architectural design.

https://github.com/deepseek-ai/DreamCraft3D
1
#python #image_processing #ocr #pdf #python #tesseract

OCRmyPDF is a tool that makes scanned PDF files searchable and editable. It adds a text layer to the PDF, so you can search for words or copy and paste text from the document. It supports many languages, fixes misrotated or crooked pages, and optimizes the file size. The tool works on various operating systems like Linux, Windows, and macOS, and it uses multiple CPU cores to speed up the process. This makes it easier to work with scanned documents and keeps your files organized and searchable.

https://github.com/ocrmypdf/OCRmyPDF
#kotlin #aes_256 #android #background_removal #clean_architecture #crop #djvu #edit_photo #exif #f_droid #filter_image #image_manipulation #jetpack_compose #jxl #kotlin #material_you #ocr_recognition #pdf #psd #qrcode_scanner #watermark

Image Toolbox is a powerful and versatile image editing tool that lets you do many things with your photos. You can crop, apply over 230 different filters, edit EXIF data, remove backgrounds, and even convert images to PDFs. It also allows you to add stickers and text, extract text from images in over 120 languages, and encrypt files with AES-256 encryption. You can resize images using various scaling algorithms, convert between multiple image formats, and create collages. The app also supports GIF, WEBP, APNG, and JXL conversions, document scanning, QR code scanning and creation, and more. It has a simple interface but offers many advanced features, making it useful for both photographers and developers.

https://github.com/T8RIN/ImageToolbox
#go #automation #c #go #golang #hook #image #mouse #opencv #robot #robotgo #rpa #window

Robotgo is a tool that helps automate tasks on your computer using the Go programming language. It can control the mouse and keyboard, capture screenshots, and work with windows. This means you can use it to automatically do things like scrolling, clicking, or typing text. Robotgo works on Windows, Mac, and Linux systems, making it very versatile. Using Robotgo can save time by automating repetitive tasks, allowing you to focus on more important things.

https://github.com/go-vgo/robotgo
#jupyter_notebook #cnn #colab #colab_notebook #computer_vision #deep_learning #deep_neural_networks #fourier #fourier_convolutions #fourier_transform #gan #generative_adversarial_network #generative_adversarial_networks #high_resolution #image_inpainting #inpainting #inpainting_algorithm #inpainting_methods #pytorch

LaMa is a powerful tool for removing objects from images. It uses special techniques called Fourier Convolutions, which help it understand the whole image at once. This makes it very good at filling in large areas that are missing. LaMa can even work well with high-resolution images, even if it was trained on smaller ones. This means you can use it to fix photos where objects are in the way, making them look natural and complete again.

https://github.com/advimman/lama
#python #3d #3d_aigc #3d_generation #diffusion_models #hunyuan3d #image_to_3d #shape #shape_generation #text_to_3d #texture_generation

Hunyuan3D 2.0 is a powerful tool that creates detailed 3D models with textures in two steps: first building the shape, then adding colors and materials. It works efficiently on standard computers (as low as 5GB VRAM for basic models) and offers multiple ways to use it, like coding, Blender plugins, or online demos, making it accessible for creating game-ready 3D assets, VR/AR content, or custom designs without needing advanced hardware.

https://github.com/Tencent/Hunyuan3D-2
#python #diffusion_models #dit #image_to_video #image_to_video_generation #text_to_video #text_to_video_generation

LTX-Video is a powerful AI model that creates high-quality, realistic videos in real time, running faster than you can watch them. It can generate videos from text descriptions, images, or existing videos, and supports advanced features like keyframe animation and video extension. You can use it online or run it locally with easy setup. It offers great control over video details, smooth motion, and works well even on consumer hardware. This helps you quickly create custom videos for storytelling, social media, or prototyping, saving time and boosting creativity with detailed, lifelike results[2][4][5].

https://github.com/Lightricks/LTX-Video
🔥1
#python #comfyui #diffusion_models #dit #image_to_video #image_to_video_generation #text_to_image #text_to_image_generation

ComfyUI-LTXVideo is a tool that helps create high-quality videos from images using AI. It offers features like key frame control, improved video quality, and faster generation speeds. This means you can make smooth videos with fewer errors and more control over how they look. It also supports commercial use, so you can use the videos for business projects. The tool is designed to work well with consumer-grade GPUs, making it accessible to more users. Overall, it helps you create professional-looking videos quickly and easily.

https://github.com/Lightricks/ComfyUI-LTXVideo
🔥1
#python #ai #ai_art #art #asset_generator #chatbot #deep_learning #desktop_app #image_generation #mistral #multimodal #privacy #pygame #pyside6 #python #self_hosted #speech_to_text #stable_diffusion #text_to_image #text_to_speech #text_to_speech_app

AI Runner is a tool that lets you use AI on your own computer without needing the internet. It can do many things like **voice chatbots**, **text-to-image** generation, and **image editing**. You can also make AI personalities for more interesting conversations. It runs fast and securely, keeping your data private. To use AI Runner, you need a good computer with a strong GPU, like an NVIDIA RTX 3060 or better. This helps keep your data safe and makes AI tasks faster.

https://github.com/Capsize-Games/airunner
#python #face_animation #image_animation #video_editing #video_generation

LivePortrait is a tool that uses AI to animate still photos, making them look like videos. It works by identifying key facial features and adding realistic movements. This technology helps create lifelike videos that can be used for personalized communication. The benefit to users is that they can easily create engaging animated portraits from static images, which can be fun and useful for various applications like social media or storytelling.

https://github.com/KwaiVGI/LivePortrait
#typescript #alternative #converter #data_manipulation #developer_tools #devtools #frontend #good_first_issue #image_manipulation #image_processing #javascript #pdf_manipulation #productivity #react #self_hosted #swissarmyknife #tools #typescript #video_manipulation #webapp #website

OmniTools is a self-hosted web app that helps with many tasks like image and video editing, number crunching, and more. It offers tools for resizing images, converting videos, calculating dates, and generating prime numbers. You can run it on your own computer using Docker, which means your data stays local. This app is open-source and free, allowing you to contribute new features or tools easily. Using OmniTools simplifies many everyday tasks and keeps your data private.

https://github.com/iib0011/omni-tools
👍1
#rust #2d_graphics #art #compositor #design #graphic_design #graphics_editor #image_generation #image_manipulation #image_processing #node_editor #node_graph #photo_editing #photo_editor #procedural #procedural_art #procedural_drawing #svg_editor #vector_editor

Graphite is a free, open-source 2D graphics editor that combines vector and raster tools with a unique hybrid workflow using layers and nodes. It lets you create detailed vector art and designs with nondestructive editing, meaning you can change your work anytime without losing quality. The node-based system offers powerful, flexible control like visual programming, while the layer system keeps things simple and familiar. This makes it easy to create complex graphics, animations, and effects all in one tool. Graphite is still evolving but aims to be a versatile, all-in-one creative platform accessible to everyone, helping you unleash your artistic potential efficiently[1][2][4].

https://github.com/GraphiteEditor/Graphite
2
#python #deep_learning #diffusion #flax #flux #hacktoberfest #image_generation #image2image #image2video #jax #latent_diffusion_models #pytorch #score_based_generative_modeling #stable_diffusion #stable_diffusion_diffusers #text2image #text2video #video2video

The Hugging Face Diffusers library is a powerful and easy-to-use tool for generating images, audio, and 3D molecular structures using advanced diffusion models. It offers ready-to-use pretrained models and flexible components like pipelines, schedulers, and model building blocks, allowing you to quickly create or customize your own diffusion-based projects. Installation is simple via pip or conda, and you can generate high-quality outputs with just a few lines of code. This library benefits you by making cutting-edge AI generation accessible, customizable, and efficient, whether you want to run models or train your own[1][2][5].

https://github.com/huggingface/diffusers
#vue #canvas_editor #design #design_editor #editor #fabricjs #image_editor #poster #svg_editor #vue_fabric

You can use a powerful open-source image editor built with fabric.js and Vue that lets you easily design images by dragging and dropping. It supports many features like importing PSD and JSON files, exporting PNG and SVG, layers, gradients, custom fonts, cropping, filters, and more. You can customize fonts, templates, right-click menus, and shortcuts, and extend it with plugins. This editor is lightweight and simple to use, making it great for quick image editing without complex tools. It also offers a paid version with full backend support and batch image generation, helping you save time and reduce development effort.

https://github.com/ikuaitu/vue-fabric-editor
#typescript #ai #ai_chatbot #angular #chat #chatbot #chatgpt #cohere #component #files #huggingface #image #nextjs #openai #react #react_chatbot #solid #speech #svelte #vue

Deep Chat is an easy-to-add AI chat tool for your website that connects with popular AI services like ChatGPT and HuggingFace or your own custom APIs using just one line of code. It supports text, voice input, speech-to-text, text-to-speech, file sharing, webcam photos, and audio recording, making conversations more interactive. You can customize everything from avatars to message styles and run small AI models directly in the browser without servers. It works with major web frameworks and offers features like local message storage and focus mode for a modern chat experience. This helps you quickly add a powerful, flexible AI chatbot that fits your needs and improves user engagement.

https://github.com/OvidijusParsiunas/deep-chat
#python #blind_watermark #image_processing #watermark #watermark_image

You can add invisible watermarks to images using a Python tool based on DWT-DCT-SVD techniques, which hides your watermark securely without changing the image's appearance. This watermark can be embedded and later extracted even if the image is rotated, cropped, resized, or altered by noise or brightness changes. You can use it easily via command line or Python code, protecting your images from unauthorized use while keeping them visually unchanged. This helps prove ownership and maintain image authenticity without affecting quality or usability. The tool supports embedding text, images, or bit arrays as watermarks and works on Windows, Linux, and macOS.

https://github.com/guofei9987/blind_watermark
1
#python #audio_generation #diffusion #image_generation #inference #model_serving #multimodal #pytorch #transformer #video_generation

vLLM-Omni is a free, open-source tool that makes serving AI models for text, images, videos, and audio fast, easy, and cheap. It builds on vLLM for top speed using smart memory tricks, overlapping tasks, and flexible resource sharing across GPUs. You get 2x higher throughput, 35% less delay, and simple setup with Hugging Face models via OpenAI API—perfect for building quick multi-modal apps like chatbots or media generators without high costs.

https://github.com/vllm-project/vllm-omni