GitHub Trends

#python #asr #audio #audio_processing #deep_learning #huggingface #language_model #pytorch #speaker_diarization #speaker_recognition #speaker_verification #speech_enhancement #speech_processing #speech_recognition #speech_separation #speech_to_text #speech_toolkit #speechrecognition #spoken_language_understanding #transformers #voice_recognition

SpeechBrain is an open-source toolkit that helps you quickly develop Conversational AI technologies, such as speech assistants, chatbots, and language models. It uses PyTorch and offers many pre-trained models and tutorials to make it easy to get started. You can train models for various tasks like speech recognition, speaker recognition, and text processing with just a few lines of code. SpeechBrain also supports GPU training, dynamic batching, and integration with HuggingFace models, making it powerful and efficient. This toolkit is beneficial because it simplifies the development process, provides extensive documentation and tutorials, and is highly customizable, making it ideal for research, prototyping, and educational purposes.

https://github.com/speechbrain/speechbrain

GitHub

GitHub - speechbrain/speechbrain: A PyTorch-based Speech Toolkit

A PyTorch-based Speech Toolkit. Contribute to speechbrain/speechbrain development by creating an account on GitHub.

430 views17:30

GitHub Trends

#python #ai #alexa #amazon_echo #anyq #asr #bci #chatgpt #google_home #gpt3 #homeassistant #muse #openai #raspeberry_pi #snowboy #speaker #tts #unit

wukong-robot is a simple, flexible, and elegant Chinese voice dialogue robot/smart speaker project. It allows makers and hackers in China to quickly create personalized smart speakers. Here are the key benefits You can customize and develop your own plugins for speech recognition, synthesis, and dialogue management.
- **Chinese Support** It supports integration with smart home protocols like Siri, 小爱音箱, and HomeAssistant, allowing voice control of smart devices.
- **Easy Installation** You can customize the robot's name, choose different speech recognition and synthesis plugins, and even use brain-computer interface (BCI) for wake-up.
- **Open API**: It provides an open API for more advanced functionalities.

Overall, wukong-robot offers a highly customizable and flexible solution for creating smart speakers, making it a great choice for those who want to personalize their smart home experience.

https://github.com/wzpan/wukong-robot

GitHub

GitHub - wzpan/wukong-robot: 🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目，支持ChatGPT多轮对话能力，还可能是首个支持脑机交互的开源智能音箱项目。

🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目，支持ChatGPT多轮对话能力，还可能是首个支持脑机交互的开源智能音箱项目。 - wzpan/wukong-robot

❤1

449 views11:30

GitHub Trends

#python #deep_learning #glow_tts #hifigan #melgan #multi_speaker_tts #python #pytorch #speaker_encoder #speaker_encodings #speech #speech_synthesis #tacotron #text_to_speech #tts #tts_model #vocoder #voice_cloning #voice_conversion #voice_synthesis

The new version of TTS (Text-to-Speech) from Coqui.ai, called TTSv2, is now available with several improvements. It supports 16 languages and has better performance overall. You can fine-tune the models using the provided code and examples. The TTS system can now stream audio with less than 200ms latency, making it very responsive. Additionally, you can use over 1,100 Fairseq models and new features like voice cloning and voice conversion. This update also includes faster inference with the Tortoise model and support for multiple speakers and languages. These enhancements make it easier and more efficient to generate high-quality speech from text.

https://github.com/coqui-ai/TTS

GitHub

GitHub - coqui-ai/TTS: 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production - coqui-ai/TTS

477 views13:00

GitHub Trends

#python #asr #deeplearning #generative_ai #large_language_models #machine_translation #multimodal #neural_networks #speaker_diariazation #speaker_recognition #speech_synthesis #speech_translation #tts

NVIDIA NeMo is a powerful, easy-to-use platform for building, customizing, and deploying generative AI models like large language models (LLMs), vision language models, and speech AI. It lets you quickly train and fine-tune models using pre-built code and checkpoints, supports the latest model architectures, and works on cloud, data center, or edge environments. NeMo 2.0 is even more flexible and scalable, with Python-based configuration and modular design, making it simple to experiment and scale up. The main benefit is that you can create advanced AI applications faster, with less effort, and at lower cost, while getting high performance and easy deployment options[1][2][3].

https://github.com/NVIDIA/NeMo

GitHub

GitHub - NVIDIA-NeMo/NeMo: A scalable generative AI framework built for researchers and developers working on Large Language Models…

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech) - NVIDIA-NeMo/NeMo

548 views13:00

GitHub Trends

#jupyter_notebook #android #asr #deep_learning #deep_neural_networks #deepspeech #google_speech_to_text #ios #kaldi #offline #privacy #python #raspberry_pi #speaker_identification #speaker_verification #speech_recognition #speech_to_text #speech_to_text_android #stt #voice_recognition #vosk

Vosk is a powerful tool for recognizing speech without needing the internet. It supports over 20 languages and dialects, making it useful for many different users. Vosk is small and efficient, allowing it to work on small devices like smartphones and Raspberry Pi. It can be used for things like chatbots, smart home devices, and creating subtitles for videos. This means users can have private and fast speech recognition anywhere, which is especially helpful when internet access is limited.

https://github.com/alphacep/vosk-api

GitHub

GitHub - alphacep/vosk-api: Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and…

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node - alphacep/vosk-api

447 views13:30

About

Blog

Apps

Platform