GitHub Trends
10.1K subscribers
15.3K links
See what the GitHub community is most excited about today.

A bot automatically fetches new repositories from https://github.com/trending and sends them to the channel.

Author and maintainer: https://github.com/katursis
Download Telegram
#python #agents #ai #multimodal #real_time #video #voice #voice_assistant

The Agents framework helps you build AI-driven programs that can interact with users in real-time through text, audio, images, or video. It integrates with OpenAI's Realtime API for ultra-low latency interactions and supports various plugins for speech-to-text, text-to-speech, and other AI services. You can use it to create voice assistants, transcription agents, and more, with easy deployment across local, self-hosted, or cloud environments. This makes it easier to develop interactive AI applications quickly and efficiently.

https://github.com/livekit/agents
#python #asr #audio #audio_processing #deep_learning #huggingface #language_model #pytorch #speaker_diarization #speaker_recognition #speaker_verification #speech_enhancement #speech_processing #speech_recognition #speech_separation #speech_to_text #speech_toolkit #speechrecognition #spoken_language_understanding #transformers #voice_recognition

SpeechBrain is an open-source toolkit that helps you quickly develop Conversational AI technologies, such as speech assistants, chatbots, and language models. It uses PyTorch and offers many pre-trained models and tutorials to make it easy to get started. You can train models for various tasks like speech recognition, speaker recognition, and text processing with just a few lines of code. SpeechBrain also supports GPU training, dynamic batching, and integration with HuggingFace models, making it powerful and efficient. This toolkit is beneficial because it simplifies the development process, provides extensive documentation and tutorials, and is highly customizable, making it ideal for research, prototyping, and educational purposes.

https://github.com/speechbrain/speechbrain
#python #audio_generation #audio_synthesis #audioldm #audit #fastspeech2 #hifi_gan #music_generation #naturalspeech2 #singing_voice_conversion #speech_synthesis #text_to_audio #text_to_speech #vall_e #vits #voice_conversion

Amphion is a toolkit for generating audio, music, and speech. It helps researchers and engineers, especially beginners, by providing tools for various tasks like turning text into speech (TTS), singing voice conversion (SVC), and text to audio (TTA). Amphion includes visualizations to help understand how these models work, which is very useful for learning. It also offers different vocoders to produce high-quality audio and evaluation metrics to ensure the generated audio is good. This toolkit is free to use under the MIT License and can be installed easily using Python or Docker. Using Amphion, you can create high-quality audio and music with advanced features, making it a powerful tool for both research and practical applications.

https://github.com/open-mmlab/Amphion
👍1
#python #ai_translation #dubbing #localization #video_translation #voice_cloning

VideoLingo is a powerful tool that helps translate, localize, and dub videos, making them understandable across different languages. It uses advanced technologies like WhisperX for accurate subtitle recognition and GPT for high-quality translations. The tool ensures single-line subtitles, similar to those on Netflix, and offers dubbing alignment for a more natural viewing experience. You can use it online, in Google Colab, or install it locally on your computer. This makes it easier to share videos globally without language barriers, enhancing global knowledge sharing and communication.

https://github.com/Huanshere/VideoLingo
#python #agent #ai #asr #cpp #gemini #golang #gpt_4 #gpt_4o #llm #low_latency #multimodal #nextjs14 #openai #python #rag #real_time #realtime #tts #vision #voice_assistant

The TEN Agent is a powerful tool that helps you create and manage AI agents with various capabilities like real-time vision, screen detection, and integration with services like Google Gemini Multimodal Live API, Weather Check, and Web Search. To use it, you need to set up your environment with Docker, Node.js, and specific API keys. You can follow simple steps to configure and start your agent locally. The benefits include easy integration of advanced AI features, a supportive community through Discord and GitHub discussions, and the ability to customize and extend your agents with ready-to-use extensions. This makes it easier to develop and deploy sophisticated AI applications quickly.

https://github.com/TEN-framework/TEN-Agent
#python #deep_learning #glow_tts #hifigan #melgan #multi_speaker_tts #python #pytorch #speaker_encoder #speaker_encodings #speech #speech_synthesis #tacotron #text_to_speech #tts #tts_model #vocoder #voice_cloning #voice_conversion #voice_synthesis

The new version of TTS (Text-to-Speech) from Coqui.ai, called TTSv2, is now available with several improvements. It supports 16 languages and has better performance overall. You can fine-tune the models using the provided code and examples. The TTS system can now stream audio with less than 200ms latency, making it very responsive. Additionally, you can use over 1,100 Fairseq models and new features like voice cloning and voice conversion. This update also includes faster inference with the Tortoise model and support for multiple speakers and languages. These enhancements make it easier and more efficient to generate high-quality speech from text.

https://github.com/coqui-ai/TTS
#python #audiobooks #chinese #docker #english #epub #gradio #linux #mac #multilingual #tts #voice_cloning #windows #xtts

This tool converts eBooks into audiobooks with chapters and metadata, supporting 1124 languages and optional voice cloning. Here’s how it benefits you It converts eBooks in various formats (like `.epub`, `.pdf`, `.mobi`) into audiobooks with high-quality text-to-speech using tools like Calibre, ffmpeg, and XTTSv2.
- **Multilingual Support** You can clone your own voice or use default voices for the audiobook.
- **User-Friendly Interface** You can run it on your local machine or use Docker for consistent results across different environments.
- **Free Resources**: There are options to use free resources like Google Colab or rent a GPU for faster processing.

Make sure to use this tool responsibly with non-DRM, legally acquired eBooks.

https://github.com/DrewThomasson/ebook2audiobook
#python #text_to_speech #tts #vits #voice_clone #voice_cloneai #voice_cloning

GPT-SoVITS-WebUI is a powerful tool for converting text to speech and changing voices. Here’s what it offers** You can convert text to speech instantly with just a 5-second vocal sample.
- **Few-shot TTS** It works in several languages including English, Japanese, Korean, Cantonese, and Chinese.
- **WebUI Tools:** It includes tools like voice separation, automatic training set segmentation, and text labeling, making it easier to create and use the models.

Using GPT-SoVITS-WebUI benefits you by allowing quick and easy voice conversions and text-to-speech functions with high quality and flexibility.

https://github.com/RVC-Boss/GPT-SoVITS