GitHub Trends

#python #augmix #convnext #distributed_training #dual_path_networks #efficientnet #image_classification #imagenet #maxvit #mixnet #mobile_deep_learning #mobilenet_v2 #mobilenetv3 #nfnets #normalization_free_training #pretrained_models #pretrained_weights #pytorch #randaugment #resnet #vision_transformer_models

PyTorch Image Models (`timm`) is a comprehensive library that includes a wide range of state-of-the-art image models, layers, utilities, optimizers, and training scripts. Here are the key benefits `timm` offers over 300 pre-trained models from various families like Vision Transformers, ResNets, EfficientNets, and more, allowing you to choose the best model for your task.
- **Pre-trained Weights** You can easily extract features at different levels of the network using `features_only=True` and `out_indices`, making it versatile for various applications.
- **Optimizers and Schedulers** It provides several augmentation techniques like AutoAugment, RandAugment, and regularization methods like DropPath and DropBlock to enhance model performance.
- **Reference Training Scripts**: Included are high-performance training, validation, and inference scripts that support multiple GPUs and mixed-precision training.

Overall, `timm` simplifies the process of working with deep learning models for image tasks by providing a unified interface and extensive tools for training and evaluation.

https://github.com/huggingface/pytorch-image-models

GitHub

GitHub - huggingface/pytorch-image-models: The largest collection of PyTorch image encoders / backbones. Including train, eval…

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V...

292 views13:54

GitHub Trends

#python #chinese #clip #computer_vision #contrastive_loss #coreml_models #deep_learning #image_text_retrieval #multi_modal #multi_modal_learning #nlp #pretrained_models #pytorch #transformers #vision_and_language_pre_training #vision_language

This project is about a Chinese version of the CLIP (Contrastive Language-Image Pretraining) model, trained on a large dataset of Chinese text and images. Here’s what you need to know This model helps you quickly perform tasks like calculating text and image features, cross-modal retrieval (finding images based on text or vice versa), and zero-shot image classification (classifying images without any labeled examples).
- **Ease of Use** The model has been tested on various datasets and shows strong performance in zero-shot image classification and cross-modal retrieval tasks.
- **Resources**: The project includes pre-trained models, training and testing codes, and detailed tutorials on how to use the model for different tasks.

Overall, this project makes it easy to work with Chinese text and images using advanced AI techniques, saving you time and effort.

https://github.com/OFA-Sys/Chinese-CLIP

GitHub

GitHub - OFA-Sys/Chinese-CLIP: Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation. - OFA-Sys/Chinese-CLIP

357 views14:30

GitHub Trends

#swift #ai #android #barcode #camera #instagram #ios #javascript #jsi #library #native #qr #qrcode #react #react_native #react_native_camera #scanner #snapchat #typescript #vision #worklet

VisionCamera is a powerful camera library for React Native that offers many useful features. You can capture photos and videos, scan QR codes and barcodes, use multiple cameras, and adjust resolutions and frame rates. It also supports advanced features like facial recognition, object detection, and real-time video chats through frame processors. Additionally, you can draw shapes, text, and filters on the camera view, and it includes smooth zooming, fast pause and resume, HDR and night modes, and a custom video pipeline. Installing it is easy with npm, and there are detailed guides and examples to help you get started. Using VisionCamera can enhance your app's camera capabilities significantly.

https://github.com/mrousavy/react-native-vision-camera

GitHub

GitHub - mrousavy/react-native-vision-camera: 📸 A powerful, high-performance React Native Camera library.

📸 A powerful, high-performance React Native Camera library. - mrousavy/react-native-vision-camera

❤1👍1

490 views20:00

GitHub Trends

#python #api #automation #browser #browser_automation #computer #gpt #llm #playwright #python #rpa #vision #workflow

Skyvern is a tool that automates browser-based workflows using Large Language Models (LLMs) and computer vision. It can interact with websites without needing custom scripts, making it resistant to website layout changes. Here’s how it benefits you Skyvern can handle tasks on websites it has never seen before, filling out forms, extracting data, and even handling 2FA authentication.
- **Flexibility** Unlike traditional automation methods, Skyvern is less likely to break when website layouts change.
- **Ease of Use**: You can create tasks and workflows through a simple API or a user-friendly UI, without needing to write complex code.

Overall, Skyvern simplifies and stabilizes the automation of web-based tasks, making it easier to manage and scale your workflows.

https://github.com/Skyvern-AI/skyvern

GitHub

GitHub - Skyvern-AI/skyvern: Automate browser based workflows with AI

Automate browser based workflows with AI. Contribute to Skyvern-AI/skyvern development by creating an account on GitHub.

330 views17:00

About

Blog

Apps

Platform