GitHub Trends

#python #augmix #convnext #distributed_training #dual_path_networks #efficientnet #image_classification #imagenet #maxvit #mixnet #mobile_deep_learning #mobilenet_v2 #mobilenetv3 #nfnets #normalization_free_training #pretrained_models #pretrained_weights #pytorch #randaugment #resnet #vision_transformer_models

PyTorch Image Models (`timm`) is a comprehensive library that includes a wide range of state-of-the-art image models, layers, utilities, optimizers, and training scripts. Here are the key benefits `timm` offers over 300 pre-trained models from various families like Vision Transformers, ResNets, EfficientNets, and more, allowing you to choose the best model for your task.
- **Pre-trained Weights** You can easily extract features at different levels of the network using `features_only=True` and `out_indices`, making it versatile for various applications.
- **Optimizers and Schedulers** It provides several augmentation techniques like AutoAugment, RandAugment, and regularization methods like DropPath and DropBlock to enhance model performance.
- **Reference Training Scripts**: Included are high-performance training, validation, and inference scripts that support multiple GPUs and mixed-precision training.

Overall, `timm` simplifies the process of working with deep learning models for image tasks by providing a unified interface and extensive tools for training and evaluation.

https://github.com/huggingface/pytorch-image-models

GitHub

GitHub - huggingface/pytorch-image-models: The largest collection of PyTorch image encoders / backbones. Including train, eval…

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V...

292 views13:54

GitHub Trends

#python #chinese #clip #computer_vision #contrastive_loss #coreml_models #deep_learning #image_text_retrieval #multi_modal #multi_modal_learning #nlp #pretrained_models #pytorch #transformers #vision_and_language_pre_training #vision_language

This project is about a Chinese version of the CLIP (Contrastive Language-Image Pretraining) model, trained on a large dataset of Chinese text and images. Here’s what you need to know This model helps you quickly perform tasks like calculating text and image features, cross-modal retrieval (finding images based on text or vice versa), and zero-shot image classification (classifying images without any labeled examples).
- **Ease of Use** The model has been tested on various datasets and shows strong performance in zero-shot image classification and cross-modal retrieval tasks.
- **Resources**: The project includes pre-trained models, training and testing codes, and detailed tutorials on how to use the model for different tasks.

Overall, this project makes it easy to work with Chinese text and images using advanced AI techniques, saving you time and effort.

https://github.com/OFA-Sys/Chinese-CLIP

GitHub

GitHub - OFA-Sys/Chinese-CLIP: Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation. - OFA-Sys/Chinese-CLIP

357 views14:30

GitHub Trends

#swift #inference #ios #macos #pretrained_models #speech_recognition #swift #transformers #visionos #watchos #whisper

WhisperKit is a tool that helps your Apple devices recognize speech from audio files or live recordings using OpenAI's Whisper model. It works locally on your device, which means it doesn't need internet connection once set up. To use it, you can add WhisperKit to your Swift project easily through the Swift Package Manager or install a command-line version using Homebrew. This tool is beneficial because it allows you to transcribe audio quickly and efficiently right on your device, making it useful for various applications like voice assistants or transcription services.

https://github.com/argmaxinc/WhisperKit

GitHub

GitHub - argmaxinc/WhisperKit: On-device Speech Recognition for Apple Silicon

On-device Speech Recognition for Apple Silicon. Contribute to argmaxinc/WhisperKit development by creating an account on GitHub.

390 views13:30

GitHub Trends

#python #bert #deep_learning #flax #hacktoberfest #jax #language_model #language_models #machine_learning #model_hub #natural_language_processing #nlp #nlp_library #pretrained_models #python #pytorch #pytorch_transformers #seq2seq #speech_recognition #tensorflow #transformer

The Hugging Face Transformers library provides thousands of pretrained models for various tasks like text, image, and audio processing. These models can be used for tasks such as text classification, image detection, speech recognition, and more. The library supports popular deep learning frameworks like JAX, PyTorch, and TensorFlow, making it easy to switch between them.

The benefit to the user is that you can quickly download and use these pretrained models with just a few lines of code, saving time and computational resources. You can also fine-tune these models on your own datasets and share them with the community. Additionally, the library offers a simple `pipeline` API for immediate use on different inputs, making it user-friendly for both researchers and practitioners. This helps in reducing compute costs and carbon footprint while enabling high-performance results across various machine learning tasks.

https://github.com/huggingface/transformers

GitHub

GitHub - huggingface/transformers: 🤗 Transformers: the model-definition framework for state-of-the-art machine learning models…

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training. - GitHub - huggingface/t...

354 views15:00

GitHub Trends

#python #chinese #flash_attention #large_language_models #llm #natural_language_processing #pretrained_models

The Qwen series includes powerful language models and chat models that can be used for various tasks such as chatting, content creation, information extraction, summarization, translation, coding, and more. Here are the key benefits and features Qwen offers base language models (Qwen-1.8B, Qwen-7B, Qwen-14B, Qwen-72B) and chat models (Qwen-1.8B-Chat, Qwen-7B-Chat, Qwen-14B-Chat, Qwen-72B-Chat) with different sizes and capabilities.
- **Performance** The models are available in quantized forms (Int4 and Int8) which reduce memory usage and improve inference speed without significant performance degradation.
- **System Prompt** The models can use tools, act as agents, or even interpret code, with good performance on code execution and tool-use benchmarks.
- **Long-Context Understanding** Easy deployment options include using vLLM, FastChat, Web UI demos, CLI demos, and OpenAI-style APIs.
- **Finetuning**: Scripts are provided for finetuning the models using full-parameter, LoRA, and Q-LoRA methods.

Overall, Qwen models offer robust performance, flexibility, and ease of use, making them suitable for a wide range of applications.

https://github.com/QwenLM/Qwen

GitHub

GitHub - QwenLM/Qwen: The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud. - QwenLM/Qwen

440 views13:30

About

Blog

Apps

Platform