GitHub Trends
10.1K subscribers
15.3K links
See what the GitHub community is most excited about today.

A bot automatically fetches new repositories from https://github.com/trending and sends them to the channel.

Author and maintainer: https://github.com/katursis
Download Telegram
#python #agents #ai #multimodal #real_time #video #voice #voice_assistant

The Agents framework helps you build AI-driven programs that can interact with users in real-time through text, audio, images, or video. It integrates with OpenAI's Realtime API for ultra-low latency interactions and supports various plugins for speech-to-text, text-to-speech, and other AI services. You can use it to create voice assistants, transcription agents, and more, with easy deployment across local, self-hosted, or cloud environments. This makes it easier to develop interactive AI applications quickly and efficiently.

https://github.com/livekit/agents
#python #asr #audio #audio_processing #deep_learning #huggingface #language_model #pytorch #speaker_diarization #speaker_recognition #speaker_verification #speech_enhancement #speech_processing #speech_recognition #speech_separation #speech_to_text #speech_toolkit #speechrecognition #spoken_language_understanding #transformers #voice_recognition

SpeechBrain is an open-source toolkit that helps you quickly develop Conversational AI technologies, such as speech assistants, chatbots, and language models. It uses PyTorch and offers many pre-trained models and tutorials to make it easy to get started. You can train models for various tasks like speech recognition, speaker recognition, and text processing with just a few lines of code. SpeechBrain also supports GPU training, dynamic batching, and integration with HuggingFace models, making it powerful and efficient. This toolkit is beneficial because it simplifies the development process, provides extensive documentation and tutorials, and is highly customizable, making it ideal for research, prototyping, and educational purposes.

https://github.com/speechbrain/speechbrain
#python #audio_generation #audio_synthesis #audioldm #audit #fastspeech2 #hifi_gan #music_generation #naturalspeech2 #singing_voice_conversion #speech_synthesis #text_to_audio #text_to_speech #vall_e #vits #voice_conversion

Amphion is a toolkit for generating audio, music, and speech. It helps researchers and engineers, especially beginners, by providing tools for various tasks like turning text into speech (TTS), singing voice conversion (SVC), and text to audio (TTA). Amphion includes visualizations to help understand how these models work, which is very useful for learning. It also offers different vocoders to produce high-quality audio and evaluation metrics to ensure the generated audio is good. This toolkit is free to use under the MIT License and can be installed easily using Python or Docker. Using Amphion, you can create high-quality audio and music with advanced features, making it a powerful tool for both research and practical applications.

https://github.com/open-mmlab/Amphion
👍1
#python #ai_translation #dubbing #localization #video_translation #voice_cloning

VideoLingo is a powerful tool that helps translate, localize, and dub videos, making them understandable across different languages. It uses advanced technologies like WhisperX for accurate subtitle recognition and GPT for high-quality translations. The tool ensures single-line subtitles, similar to those on Netflix, and offers dubbing alignment for a more natural viewing experience. You can use it online, in Google Colab, or install it locally on your computer. This makes it easier to share videos globally without language barriers, enhancing global knowledge sharing and communication.

https://github.com/Huanshere/VideoLingo
#python #agent #ai #asr #cpp #gemini #golang #gpt_4 #gpt_4o #llm #low_latency #multimodal #nextjs14 #openai #python #rag #real_time #realtime #tts #vision #voice_assistant

The TEN Agent is a powerful tool that helps you create and manage AI agents with various capabilities like real-time vision, screen detection, and integration with services like Google Gemini Multimodal Live API, Weather Check, and Web Search. To use it, you need to set up your environment with Docker, Node.js, and specific API keys. You can follow simple steps to configure and start your agent locally. The benefits include easy integration of advanced AI features, a supportive community through Discord and GitHub discussions, and the ability to customize and extend your agents with ready-to-use extensions. This makes it easier to develop and deploy sophisticated AI applications quickly.

https://github.com/TEN-framework/TEN-Agent
#python #deep_learning #glow_tts #hifigan #melgan #multi_speaker_tts #python #pytorch #speaker_encoder #speaker_encodings #speech #speech_synthesis #tacotron #text_to_speech #tts #tts_model #vocoder #voice_cloning #voice_conversion #voice_synthesis

The new version of TTS (Text-to-Speech) from Coqui.ai, called TTSv2, is now available with several improvements. It supports 16 languages and has better performance overall. You can fine-tune the models using the provided code and examples. The TTS system can now stream audio with less than 200ms latency, making it very responsive. Additionally, you can use over 1,100 Fairseq models and new features like voice cloning and voice conversion. This update also includes faster inference with the Tortoise model and support for multiple speakers and languages. These enhancements make it easier and more efficient to generate high-quality speech from text.

https://github.com/coqui-ai/TTS
#python #audiobooks #chinese #docker #english #epub #gradio #linux #mac #multilingual #tts #voice_cloning #windows #xtts

This tool converts eBooks into audiobooks with chapters and metadata, supporting 1124 languages and optional voice cloning. Here’s how it benefits you It converts eBooks in various formats (like `.epub`, `.pdf`, `.mobi`) into audiobooks with high-quality text-to-speech using tools like Calibre, ffmpeg, and XTTSv2.
- **Multilingual Support** You can clone your own voice or use default voices for the audiobook.
- **User-Friendly Interface** You can run it on your local machine or use Docker for consistent results across different environments.
- **Free Resources**: There are options to use free resources like Google Colab or rent a GPU for faster processing.

Make sure to use this tool responsibly with non-DRM, legally acquired eBooks.

https://github.com/DrewThomasson/ebook2audiobook
#python #text_to_speech #tts #vits #voice_clone #voice_cloneai #voice_cloning

GPT-SoVITS-WebUI is a powerful tool for converting text to speech and changing voices. Here’s what it offers** You can convert text to speech instantly with just a 5-second vocal sample.
- **Few-shot TTS** It works in several languages including English, Japanese, Korean, Cantonese, and Chinese.
- **WebUI Tools:** It includes tools like voice separation, automatic training set segmentation, and text labeling, making it easier to create and use the models.

Using GPT-SoVITS-WebUI benefits you by allowing quick and easy voice conversions and text-to-speech functions with high quality and flexibility.

https://github.com/RVC-Boss/GPT-SoVITS
#python #singing_voice_conversion #voice_conversion

This tool helps you change voices in real-time or offline. It supports voice conversion, singing voice conversion, and can clone a voice with just 1-30 seconds of reference speech. You can use it for online meetings, gaming, or live streaming. The model is easy to fine-tune with custom data, requiring only one utterance per speaker. This makes it useful for creating personalized voice effects quickly and efficiently.

https://github.com/Plachtaa/seed-vc
👍2
#python #agentic_ai #agents #ai #autonomous_agents #deepseek_r1 #llm #llm_agents #voice_assistant

AgenticSeek is a free, fully local AI assistant that runs entirely on your own computer, ensuring your data stays private with no cloud or API use. It can autonomously browse the web, write and debug code in many languages, plan and execute complex tasks, and even respond to voice commands. It smartly chooses the best AI agent for each task, making it like having a personal team of experts. This local setup avoids monthly fees and protects your privacy while giving you powerful AI help for coding, research, and task management all on your device[1][2].

https://github.com/Fosowl/agenticSeek
1👍1
#jupyter_notebook #android #asr #deep_learning #deep_neural_networks #deepspeech #google_speech_to_text #ios #kaldi #offline #privacy #python #raspberry_pi #speaker_identification #speaker_verification #speech_recognition #speech_to_text #speech_to_text_android #stt #voice_recognition #vosk

Vosk is a powerful tool for recognizing speech without needing the internet. It supports over 20 languages and dialects, making it useful for many different users. Vosk is small and efficient, allowing it to work on small devices like smartphones and Raspberry Pi. It can be used for things like chatbots, smart home devices, and creating subtitles for videos. This means users can have private and fast speech recognition anywhere, which is especially helpful when internet access is limited.

https://github.com/alphacep/vosk-api
#python #audiobook #audiobooks #content_creation #content_creator #epub_converter #kokoro #kokoro_82m #kokoro_tts #media_generation #narrator #speech_synthesis #subtitles #text_to_audio #text_to_speech #tts #voice_synthesis

Abogen is a user-friendly tool that quickly converts ePub, PDF, or text files into natural-sounding audio with synchronized subtitles, perfect for creating audiobooks or voiceovers for social media and other projects. You can customize speech speed, choose or mix voices, generate subtitles by sentence or word, and select various audio and subtitle formats. It supports batch processing with queue mode and lets you save chapters separately or merged. Installation is straightforward on Windows, Mac, and Linux, with options for GPU acceleration. This saves you time and effort in producing high-quality audio content from text files efficiently.

https://github.com/denizsafak/abogen
1
#python #text_to_speech #tts #voice_clone #zero_shot_tts

OpenVoice is a free, open-source tool that lets you clone any voice using just a short audio sample, then generate speech in that voice across many languages and accents[1][5][8]. You can fine-tune how the voice sounds—adjusting emotion, accent, rhythm, pauses, and intonation—to match your needs[1][3][5]. A major benefit is “zero-shot” cloning: you can make the cloned voice speak languages it was never trained on, which is rare in voice AI[1][3][4]. The latest version, OpenVoice V2, offers even better sound quality, supports six major languages natively, and is free for both personal and commercial use[1]. This makes it easy and affordable for anyone to create realistic, customizable voice content without needing technical expertise or expensive software.

https://github.com/myshell-ai/OpenVoice