GitHub Trends

#python #onnx #pytorch #voice_activity_detection #voice_commands #voice_control #voice_detection #voice_recognition

https://github.com/snakers4/silero-vad

GitHub

GitHub - snakers4/silero-vad: Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Silero VAD: pre-trained enterprise-grade Voice Activity Detector - snakers4/silero-vad

1.52K views11:56

GitHub Trends

#python #agents #ai #multimodal #real_time #video #voice #voice_assistant

The Agents framework helps you build AI-driven programs that can interact with users in real-time through text, audio, images, or video. It integrates with OpenAI's Realtime API for ultra-low latency interactions and supports various plugins for speech-to-text, text-to-speech, and other AI services. You can use it to create voice assistants, transcription agents, and more, with easy deployment across local, self-hosted, or cloud environments. This makes it easier to develop interactive AI applications quickly and efficiently.

https://github.com/livekit/agents

GitHub

GitHub - livekit/agents: A powerful framework for building realtime voice AI agents 🤖🎙️📹

A powerful framework for building realtime voice AI agents 🤖🎙️📹 - GitHub - livekit/agents: A powerful framework for building realtime voice AI agents 🤖🎙️📹

333 views21:00

GitHub Trends

#python #asr #audio #audio_processing #deep_learning #huggingface #language_model #pytorch #speaker_diarization #speaker_recognition #speaker_verification #speech_enhancement #speech_processing #speech_recognition #speech_separation #speech_to_text #speech_toolkit #speechrecognition #spoken_language_understanding #transformers #voice_recognition

SpeechBrain is an open-source toolkit that helps you quickly develop Conversational AI technologies, such as speech assistants, chatbots, and language models. It uses PyTorch and offers many pre-trained models and tutorials to make it easy to get started. You can train models for various tasks like speech recognition, speaker recognition, and text processing with just a few lines of code. SpeechBrain also supports GPU training, dynamic batching, and integration with HuggingFace models, making it powerful and efficient. This toolkit is beneficial because it simplifies the development process, provides extensive documentation and tutorials, and is highly customizable, making it ideal for research, prototyping, and educational purposes.

https://github.com/speechbrain/speechbrain

GitHub

GitHub - speechbrain/speechbrain: A PyTorch-based Speech Toolkit

A PyTorch-based Speech Toolkit. Contribute to speechbrain/speechbrain development by creating an account on GitHub.

430 views17:30

GitHub Trends

#python #audio_generation #audio_synthesis #audioldm #audit #fastspeech2 #hifi_gan #music_generation #naturalspeech2 #singing_voice_conversion #speech_synthesis #text_to_audio #text_to_speech #vall_e #vits #voice_conversion

Amphion is a toolkit for generating audio, music, and speech. It helps researchers and engineers, especially beginners, by providing tools for various tasks like turning text into speech (TTS), singing voice conversion (SVC), and text to audio (TTA). Amphion includes visualizations to help understand how these models work, which is very useful for learning. It also offers different vocoders to produce high-quality audio and evaluation metrics to ensure the generated audio is good. This toolkit is free to use under the MIT License and can be installed easily using Python or Docker. Using Amphion, you can create high-quality audio and music with advanced features, making it a powerful tool for both research and practical applications.

https://github.com/open-mmlab/Amphion

GitHub

GitHub - open-mmlab/Amphion: Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support…

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi...

👍1

355 views14:00

GitHub Trends

#python #ai_translation #dubbing #localization #video_translation #voice_cloning

VideoLingo is a powerful tool that helps translate, localize, and dub videos, making them understandable across different languages. It uses advanced technologies like WhisperX for accurate subtitle recognition and GPT for high-quality translations. The tool ensures single-line subtitles, similar to those on Netflix, and offers dubbing alignment for a more natural viewing experience. You can use it online, in Google Colab, or install it locally on your computer. This makes it easier to share videos globally without language barriers, enhancing global knowledge sharing and communication.

https://github.com/Huanshere/VideoLingo

GitHub

GitHub - Huanshere/VideoLingo: Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated…

Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音，一键全自动视频搬运AI字幕组 - Huanshere/VideoLingo

376 views13:30

GitHub Trends

#python #agent #ai #asr #cpp #gemini #golang #gpt_4 #gpt_4o #llm #low_latency #multimodal #nextjs14 #openai #python #rag #real_time #realtime #tts #vision #voice_assistant

The TEN Agent is a powerful tool that helps you create and manage AI agents with various capabilities like real-time vision, screen detection, and integration with services like Google Gemini Multimodal Live API, Weather Check, and Web Search. To use it, you need to set up your environment with Docker, Node.js, and specific API keys. You can follow simple steps to configure and start your agent locally. The benefits include easy integration of advanced AI features, a supportive community through Discord and GitHub discussions, and the ability to customize and extend your agents with ready-to-use extensions. This makes it easier to develop and deploy sophisticated AI applications quickly.

https://github.com/TEN-framework/TEN-Agent

GitHub

GitHub - TEN-framework/ten-framework: Open-source framework for conversational voice AI agents

Open-source framework for conversational voice AI agents - TEN-framework/ten-framework

470 views12:00

GitHub Trends

#python #deep_learning #glow_tts #hifigan #melgan #multi_speaker_tts #python #pytorch #speaker_encoder #speaker_encodings #speech #speech_synthesis #tacotron #text_to_speech #tts #tts_model #vocoder #voice_cloning #voice_conversion #voice_synthesis

The new version of TTS (Text-to-Speech) from Coqui.ai, called TTSv2, is now available with several improvements. It supports 16 languages and has better performance overall. You can fine-tune the models using the provided code and examples. The TTS system can now stream audio with less than 200ms latency, making it very responsive. Additionally, you can use over 1,100 Fairseq models and new features like voice cloning and voice conversion. This update also includes faster inference with the Tortoise model and support for multiple speakers and languages. These enhancements make it easier and more efficient to generate high-quality speech from text.

https://github.com/coqui-ai/TTS

GitHub

GitHub - coqui-ai/TTS: 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production - coqui-ai/TTS

477 views13:00

GitHub Trends

#python #audiobooks #chinese #docker #english #epub #gradio #linux #mac #multilingual #tts #voice_cloning #windows #xtts

This tool converts eBooks into audiobooks with chapters and metadata, supporting 1124 languages and optional voice cloning. Here’s how it benefits you It converts eBooks in various formats (like `.epub`, `.pdf`, `.mobi`) into audiobooks with high-quality text-to-speech using tools like Calibre, ffmpeg, and XTTSv2.
- **Multilingual Support** You can clone your own voice or use default voices for the audiobook.
- **User-Friendly Interface** You can run it on your local machine or use Docker for consistent results across different environments.
- **Free Resources**: There are options to use free resources like Google Colab or rent a GPU for faster processing.

Make sure to use this tool responsibly with non-DRM, legally acquired eBooks.

https://github.com/DrewThomasson/ebook2audiobook

GitHub

GitHub - DrewThomasson/ebook2audiobook: Generate audiobooks from e-books, voice cloning & 1158+ languages!

Generate audiobooks from e-books, voice cloning & 1158+ languages! - DrewThomasson/ebook2audiobook

474 views12:00

GitHub Trends

#python #text_to_speech #tts #vits #voice_clone #voice_cloneai #voice_cloning

GPT-SoVITS-WebUI is a powerful tool for converting text to speech and changing voices. Here’s what it offers** You can convert text to speech instantly with just a 5-second vocal sample.
- **Few-shot TTS** It works in several languages including English, Japanese, Korean, Cantonese, and Chinese.
- **WebUI Tools:** It includes tools like voice separation, automatic training set segmentation, and text labeling, making it easier to create and use the models.

Using GPT-SoVITS-WebUI benefits you by allowing quick and easy voice conversions and text-to-speech functions with high quality and flexibility.

https://github.com/RVC-Boss/GPT-SoVITS

GitHub

GitHub - RVC-Boss/GPT-SoVITS: 1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

1 min voice data can also be used to train a good TTS model! (few shot voice cloning) - RVC-Boss/GPT-SoVITS

428 views12:30

GitHub Trends

#python #singing_voice_conversion #voice_conversion

This tool helps you change voices in real-time or offline. It supports voice conversion, singing voice conversion, and can clone a voice with just 1-30 seconds of reference speech. You can use it for online meetings, gaming, or live streaming. The model is easy to fine-tune with custom data, requiring only one utterance per speaker. This makes it useful for creating personalized voice effects quickly and efficiently.

https://github.com/Plachtaa/seed-vc

GitHub

GitHub - Plachtaa/seed-vc: zero-shot voice conversion & singing voice conversion, with real-time support

zero-shot voice conversion & singing voice conversion, with real-time support - Plachtaa/seed-vc

👍2

581 views12:00

GitHub Trends

#python #agentic_ai #agents #ai #autonomous_agents #deepseek_r1 #llm #llm_agents #voice_assistant

AgenticSeek is a free, fully local AI assistant that runs entirely on your own computer, ensuring your data stays private with no cloud or API use. It can autonomously browse the web, write and debug code in many languages, plan and execute complex tasks, and even respond to voice commands. It smartly chooses the best AI agent for each task, making it like having a personal team of experts. This local setup avoids monthly fees and protects your privacy while giving you powerful AI help for coding, research, and task management all on your device[1][2].

https://github.com/Fosowl/agenticSeek

GitHub

GitHub - Fosowl/agenticSeek: Fully Local Manus AI. No APIs, No $200 monthly bills. Enjoy an autonomous agent that thinks, browses…

Fully Local Manus AI. No APIs, No $200 monthly bills. Enjoy an autonomous agent that thinks, browses the web, and code for the sole cost of electricity. 🔔 Official updates only via twitter @Martin9...

❤1👍1

403 views13:00

GitHub Trends

#jupyter_notebook #android #asr #deep_learning #deep_neural_networks #deepspeech #google_speech_to_text #ios #kaldi #offline #privacy #python #raspberry_pi #speaker_identification #speaker_verification #speech_recognition #speech_to_text #speech_to_text_android #stt #voice_recognition #vosk

Vosk is a powerful tool for recognizing speech without needing the internet. It supports over 20 languages and dialects, making it useful for many different users. Vosk is small and efficient, allowing it to work on small devices like smartphones and Raspberry Pi. It can be used for things like chatbots, smart home devices, and creating subtitles for videos. This means users can have private and fast speech recognition anywhere, which is especially helpful when internet access is limited.

https://github.com/alphacep/vosk-api

GitHub

GitHub - alphacep/vosk-api: Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and…

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node - alphacep/vosk-api

447 views13:30

GitHub Trends

#python #audiobook #audiobooks #content_creation #content_creator #epub_converter #kokoro #kokoro_82m #kokoro_tts #media_generation #narrator #speech_synthesis #subtitles #text_to_audio #text_to_speech #tts #voice_synthesis

Abogen is a user-friendly tool that quickly converts ePub, PDF, or text files into natural-sounding audio with synchronized subtitles, perfect for creating audiobooks or voiceovers for social media and other projects. You can customize speech speed, choose or mix voices, generate subtitles by sentence or word, and select various audio and subtitle formats. It supports batch processing with queue mode and lets you save chapters separately or merged. Installation is straightforward on Windows, Mac, and Linux, with options for GPU acceleration. This saves you time and effort in producing high-quality audio content from text files efficiently.

https://github.com/denizsafak/abogen

GitHub

GitHub - denizsafak/abogen: Generate audiobooks from EPUBs, PDFs and text with synchronized captions.

Generate audiobooks from EPUBs, PDFs and text with synchronized captions. - denizsafak/abogen

❤1

446 views13:30

GitHub Trends

#python #text_to_speech #tts #voice_clone #zero_shot_tts

OpenVoice is a free, open-source tool that lets you clone any voice using just a short audio sample, then generate speech in that voice across many languages and accents[1][5][8]. You can fine-tune how the voice sounds—adjusting emotion, accent, rhythm, pauses, and intonation—to match your needs[1][3][5]. A major benefit is “zero-shot” cloning: you can make the cloned voice speak languages it was never trained on, which is rare in voice AI[1][3][4]. The latest version, OpenVoice V2, offers even better sound quality, supports six major languages natively, and is free for both personal and commercial use[1]. This makes it easy and affordable for anyone to create realistic, customizable voice content without needing technical expertise or expensive software.

https://github.com/myshell-ai/OpenVoice

GitHub

GitHub - myshell-ai/OpenVoice: Instant voice cloning by MIT and MyShell. Audio foundation model.

Instant voice cloning by MIT and MyShell. Audio foundation model. - myshell-ai/OpenVoice

566 views13:00

About

Blog

Apps

Platform