GitHub Trends
10.1K subscribers
15.3K links
See what the GitHub community is most excited about today.

A bot automatically fetches new repositories from https://github.com/trending and sends them to the channel.

Author and maintainer: https://github.com/katursis
Download Telegram
#jupyter_notebook #computer_vision #deep_learning #drug_discovery #forecasting #large_language_models #mxnet #nlp #paddlepaddle #pytorch #recommender_systems #speech_recognition #speech_synthesis #tensorflow #tensorflow2 #translation

This repository provides top-quality deep learning examples that are easy to train and deploy on NVIDIA GPUs. It includes a wide range of models for computer vision, natural language processing, recommender systems, speech to text, and more. These examples are updated monthly and come in Docker containers with the latest NVIDIA software, ensuring the best performance. The models support multiple GPUs and nodes, and some are optimized for Tensor Cores, which can significantly speed up training. This makes it easier for users to achieve high accuracy and performance in their deep learning projects.

https://github.com/NVIDIA/DeepLearningExamples
#python #speech_synthesis #text_to_speech #tts

The `edge-tts` module lets you use Microsoft Edge's text-to-speech service in your Python code or through commands. You can install it using `pip install edge-tts`. With this module, you can convert text to speech, change the voice and language, adjust the speech rate, volume, and pitch, and even play back the speech immediately. This is useful because it allows you to easily create audio files from text and customize how they sound, making it handy for various applications like automated announcements or educational tools.

https://github.com/rany2/edge-tts
#python #audio_generation #audio_synthesis #audioldm #audit #fastspeech2 #hifi_gan #music_generation #naturalspeech2 #singing_voice_conversion #speech_synthesis #text_to_audio #text_to_speech #vall_e #vits #voice_conversion

Amphion is a toolkit for generating audio, music, and speech. It helps researchers and engineers, especially beginners, by providing tools for various tasks like turning text into speech (TTS), singing voice conversion (SVC), and text to audio (TTA). Amphion includes visualizations to help understand how these models work, which is very useful for learning. It also offers different vocoders to produce high-quality audio and evaluation metrics to ensure the generated audio is good. This toolkit is free to use under the MIT License and can be installed easily using Python or Docker. Using Amphion, you can create high-quality audio and music with advanced features, making it a powerful tool for both research and practical applications.

https://github.com/open-mmlab/Amphion
👍1
#python #deep_learning #glow_tts #hifigan #melgan #multi_speaker_tts #python #pytorch #speaker_encoder #speaker_encodings #speech #speech_synthesis #tacotron #text_to_speech #tts #tts_model #vocoder #voice_cloning #voice_conversion #voice_synthesis

The new version of TTS (Text-to-Speech) from Coqui.ai, called TTSv2, is now available with several improvements. It supports 16 languages and has better performance overall. You can fine-tune the models using the provided code and examples. The TTS system can now stream audio with less than 200ms latency, making it very responsive. Additionally, you can use over 1,100 Fairseq models and new features like voice cloning and voice conversion. This update also includes faster inference with the Tortoise model and support for multiple speakers and languages. These enhancements make it easier and more efficient to generate high-quality speech from text.

https://github.com/coqui-ai/TTS
#python #apple_silicon #audio_processing #mlx #multimodal #speech_recognition #speech_synthesis #speech_to_text #text_to_speech #transformers

MLX-Audio is a powerful tool for converting text into speech and speech into new audio. It works well on Apple Silicon devices, like M-series chips, making it fast and efficient. You can choose from different languages and voices, and even adjust how fast the speech is. It also includes a web interface where you can see audio in 3D and play your own files. This tool is helpful for making audiobooks, interactive media, and personal projects because it's easy to use and provides high-quality audio quickly.

https://github.com/Blaizzy/mlx-audio
#python #asr #deeplearning #generative_ai #large_language_models #machine_translation #multimodal #neural_networks #speaker_diariazation #speaker_recognition #speech_synthesis #speech_translation #tts

NVIDIA NeMo is a powerful, easy-to-use platform for building, customizing, and deploying generative AI models like large language models (LLMs), vision language models, and speech AI. It lets you quickly train and fine-tune models using pre-built code and checkpoints, supports the latest model architectures, and works on cloud, data center, or edge environments. NeMo 2.0 is even more flexible and scalable, with Python-based configuration and modular design, making it simple to experiment and scale up. The main benefit is that you can create advanced AI applications faster, with less effort, and at lower cost, while getting high performance and easy deployment options[1][2][3].

https://github.com/NVIDIA/NeMo
#python #audiobook #audiobooks #content_creation #content_creator #epub_converter #kokoro #kokoro_82m #kokoro_tts #media_generation #narrator #speech_synthesis #subtitles #text_to_audio #text_to_speech #tts #voice_synthesis

Abogen is a user-friendly tool that quickly converts ePub, PDF, or text files into natural-sounding audio with synchronized subtitles, perfect for creating audiobooks or voiceovers for social media and other projects. You can customize speech speed, choose or mix voices, generate subtitles by sentence or word, and select various audio and subtitle formats. It supports batch processing with queue mode and lets you save chapters separately or merged. Installation is straightforward on Windows, Mac, and Linux, with options for GPU acceleration. This saves you time and effort in producing high-quality audio content from text files efficiently.

https://github.com/denizsafak/abogen
1