#jupyter_notebook #asr #asr_benchmark #colab #english #enterprise_grade_stt #german #pretrained_models #pytorch #silero_models #spanish #speech_recognition #speech_to_text #stt #stt_benchmark
https://github.com/snakers4/silero-models
https://github.com/snakers4/silero-models
GitHub
GitHub - snakers4/silero-models: Silero Models: pre-trained text-to-speech models made embarrassingly simple
Silero Models: pre-trained text-to-speech models made embarrassingly simple - snakers4/silero-models
#python #asr #conformer #e2e_models #production_ready #pytorch #transformer
https://github.com/mobvoi/wenet
https://github.com/mobvoi/wenet
GitHub
GitHub - wenet-e2e/wenet: Production First and Production Ready End-to-End Speech Recognition Toolkit
Production First and Production Ready End-to-End Speech Recognition Toolkit - wenet-e2e/wenet
#cplusplus #android #asr #deep_learning #deep_neural_networks #deepspeech #google_speech_to_text #ios #kaldi #offline #privacy #python #raspberry_pi #speaker_identification #speaker_verification #speech_recognition #speech_to_text #speech_to_text_android #stt #voice_recognition #vosk
https://github.com/alphacep/vosk-api
https://github.com/alphacep/vosk-api
GitHub
GitHub - alphacep/vosk-api: Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and…
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node - alphacep/vosk-api
#python #asr #audio #audio_processing #deep_learning #huggingface #language_model #pytorch #speaker_diarization #speaker_recognition #speaker_verification #speech_enhancement #speech_processing #speech_recognition #speech_separation #speech_to_text #speech_toolkit #speechrecognition #spoken_language_understanding #transformers #voice_recognition
SpeechBrain is an open-source toolkit that helps you quickly develop Conversational AI technologies, such as speech assistants, chatbots, and language models. It uses PyTorch and offers many pre-trained models and tutorials to make it easy to get started. You can train models for various tasks like speech recognition, speaker recognition, and text processing with just a few lines of code. SpeechBrain also supports GPU training, dynamic batching, and integration with HuggingFace models, making it powerful and efficient. This toolkit is beneficial because it simplifies the development process, provides extensive documentation and tutorials, and is highly customizable, making it ideal for research, prototyping, and educational purposes.
https://github.com/speechbrain/speechbrain
SpeechBrain is an open-source toolkit that helps you quickly develop Conversational AI technologies, such as speech assistants, chatbots, and language models. It uses PyTorch and offers many pre-trained models and tutorials to make it easy to get started. You can train models for various tasks like speech recognition, speaker recognition, and text processing with just a few lines of code. SpeechBrain also supports GPU training, dynamic batching, and integration with HuggingFace models, making it powerful and efficient. This toolkit is beneficial because it simplifies the development process, provides extensive documentation and tutorials, and is highly customizable, making it ideal for research, prototyping, and educational purposes.
https://github.com/speechbrain/speechbrain
GitHub
GitHub - speechbrain/speechbrain: A PyTorch-based Speech Toolkit
A PyTorch-based Speech Toolkit. Contribute to speechbrain/speechbrain development by creating an account on GitHub.
#python #ai #alexa #amazon_echo #anyq #asr #bci #chatgpt #google_home #gpt3 #homeassistant #muse #openai #raspeberry_pi #snowboy #speaker #tts #unit
wukong-robot is a simple, flexible, and elegant Chinese voice dialogue robot/smart speaker project. It allows makers and hackers in China to quickly create personalized smart speakers. Here are the key benefits You can customize and develop your own plugins for speech recognition, synthesis, and dialogue management.
- **Chinese Support** It supports integration with smart home protocols like Siri, 小爱音箱, and HomeAssistant, allowing voice control of smart devices.
- **Easy Installation** You can customize the robot's name, choose different speech recognition and synthesis plugins, and even use brain-computer interface (BCI) for wake-up.
- **Open API**: It provides an open API for more advanced functionalities.
Overall, wukong-robot offers a highly customizable and flexible solution for creating smart speakers, making it a great choice for those who want to personalize their smart home experience.
https://github.com/wzpan/wukong-robot
wukong-robot is a simple, flexible, and elegant Chinese voice dialogue robot/smart speaker project. It allows makers and hackers in China to quickly create personalized smart speakers. Here are the key benefits You can customize and develop your own plugins for speech recognition, synthesis, and dialogue management.
- **Chinese Support** It supports integration with smart home protocols like Siri, 小爱音箱, and HomeAssistant, allowing voice control of smart devices.
- **Easy Installation** You can customize the robot's name, choose different speech recognition and synthesis plugins, and even use brain-computer interface (BCI) for wake-up.
- **Open API**: It provides an open API for more advanced functionalities.
Overall, wukong-robot offers a highly customizable and flexible solution for creating smart speakers, making it a great choice for those who want to personalize their smart home experience.
https://github.com/wzpan/wukong-robot
GitHub
GitHub - wzpan/wukong-robot: 🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目,支持ChatGPT多轮对话能力,还可能是首个支持脑机交互的开源智能音箱项目。
🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目,支持ChatGPT多轮对话能力,还可能是首个支持脑机交互的开源智能音箱项目。 - wzpan/wukong-robot
❤1
#python #agent #ai #asr #cpp #gemini #golang #gpt_4 #gpt_4o #llm #low_latency #multimodal #nextjs14 #openai #python #rag #real_time #realtime #tts #vision #voice_assistant
The TEN Agent is a powerful tool that helps you create and manage AI agents with various capabilities like real-time vision, screen detection, and integration with services like Google Gemini Multimodal Live API, Weather Check, and Web Search. To use it, you need to set up your environment with Docker, Node.js, and specific API keys. You can follow simple steps to configure and start your agent locally. The benefits include easy integration of advanced AI features, a supportive community through Discord and GitHub discussions, and the ability to customize and extend your agents with ready-to-use extensions. This makes it easier to develop and deploy sophisticated AI applications quickly.
https://github.com/TEN-framework/TEN-Agent
The TEN Agent is a powerful tool that helps you create and manage AI agents with various capabilities like real-time vision, screen detection, and integration with services like Google Gemini Multimodal Live API, Weather Check, and Web Search. To use it, you need to set up your environment with Docker, Node.js, and specific API keys. You can follow simple steps to configure and start your agent locally. The benefits include easy integration of advanced AI features, a supportive community through Discord and GitHub discussions, and the ability to customize and extend your agents with ready-to-use extensions. This makes it easier to develop and deploy sophisticated AI applications quickly.
https://github.com/TEN-framework/TEN-Agent
GitHub
GitHub - TEN-framework/ten-framework: Open-source framework for conversational voice AI agents
Open-source framework for conversational voice AI agents - TEN-framework/ten-framework
#cplusplus #aarch64 #android #arm32 #asr #cpp #csharp #dotnet #ios #lazarus #linux #macos #mfc #object_pascal #onnx #raspberry_pi #risc_v #speech_to_text #text_to_speech #vits #windows
This tool supports various speech functions like speech recognition, text-to-speech, speaker identification, and more. It works on multiple platforms including Android, iOS, Windows, macOS, and Linux, and supports several programming languages such as C++, Python, JavaScript, and others. You can use it locally or through web assembly, making it versatile and convenient. This benefits you by allowing you to integrate advanced speech capabilities into your projects easily, regardless of the platform or programming language you use.
https://github.com/k2-fsa/sherpa-onnx
This tool supports various speech functions like speech recognition, text-to-speech, speaker identification, and more. It works on multiple platforms including Android, iOS, Windows, macOS, and Linux, and supports several programming languages such as C++, Python, JavaScript, and others. You can use it locally or through web assembly, making it versatile and convenient. This benefits you by allowing you to integrate advanced speech capabilities into your projects easily, regardless of the platform or programming language you use.
https://github.com/k2-fsa/sherpa-onnx
GitHub
GitHub - k2-fsa/sherpa-onnx: Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD…
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr...
#python #asr #automatic_speech_recognition #conformer #e2e_models #production_ready #pytorch #speech_recognition #transformer #whisper
WeNet is a powerful tool for speech recognition that helps turn spoken words into text. It's designed to be easy to use and works well in real-world situations, making it great for businesses and developers. WeNet provides accurate results on many public datasets and is lightweight, meaning it doesn't require a lot of resources to run. This makes it beneficial for users who need reliable speech-to-text functionality without complex setup or maintenance.
https://github.com/wenet-e2e/wenet
WeNet is a powerful tool for speech recognition that helps turn spoken words into text. It's designed to be easy to use and works well in real-world situations, making it great for businesses and developers. WeNet provides accurate results on many public datasets and is lightweight, meaning it doesn't require a lot of resources to run. This makes it beneficial for users who need reliable speech-to-text functionality without complex setup or maintenance.
https://github.com/wenet-e2e/wenet
GitHub
GitHub - wenet-e2e/wenet: Production First and Production Ready End-to-End Speech Recognition Toolkit
Production First and Production Ready End-to-End Speech Recognition Toolkit - wenet-e2e/wenet
#python #asr #deeplearning #generative_ai #large_language_models #machine_translation #multimodal #neural_networks #speaker_diariazation #speaker_recognition #speech_synthesis #speech_translation #tts
NVIDIA NeMo is a powerful, easy-to-use platform for building, customizing, and deploying generative AI models like large language models (LLMs), vision language models, and speech AI. It lets you quickly train and fine-tune models using pre-built code and checkpoints, supports the latest model architectures, and works on cloud, data center, or edge environments. NeMo 2.0 is even more flexible and scalable, with Python-based configuration and modular design, making it simple to experiment and scale up. The main benefit is that you can create advanced AI applications faster, with less effort, and at lower cost, while getting high performance and easy deployment options[1][2][3].
https://github.com/NVIDIA/NeMo
NVIDIA NeMo is a powerful, easy-to-use platform for building, customizing, and deploying generative AI models like large language models (LLMs), vision language models, and speech AI. It lets you quickly train and fine-tune models using pre-built code and checkpoints, supports the latest model architectures, and works on cloud, data center, or edge environments. NeMo 2.0 is even more flexible and scalable, with Python-based configuration and modular design, making it simple to experiment and scale up. The main benefit is that you can create advanced AI applications faster, with less effort, and at lower cost, while getting high performance and easy deployment options[1][2][3].
https://github.com/NVIDIA/NeMo
GitHub
GitHub - NVIDIA-NeMo/NeMo: A scalable generative AI framework built for researchers and developers working on Large Language Models…
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech) - NVIDIA-NeMo/NeMo
#jupyter_notebook #android #asr #deep_learning #deep_neural_networks #deepspeech #google_speech_to_text #ios #kaldi #offline #privacy #python #raspberry_pi #speaker_identification #speaker_verification #speech_recognition #speech_to_text #speech_to_text_android #stt #voice_recognition #vosk
Vosk is a powerful tool for recognizing speech without needing the internet. It supports over 20 languages and dialects, making it useful for many different users. Vosk is small and efficient, allowing it to work on small devices like smartphones and Raspberry Pi. It can be used for things like chatbots, smart home devices, and creating subtitles for videos. This means users can have private and fast speech recognition anywhere, which is especially helpful when internet access is limited.
https://github.com/alphacep/vosk-api
Vosk is a powerful tool for recognizing speech without needing the internet. It supports over 20 languages and dialects, making it useful for many different users. Vosk is small and efficient, allowing it to work on small devices like smartphones and Raspberry Pi. It can be used for things like chatbots, smart home devices, and creating subtitles for videos. This means users can have private and fast speech recognition anywhere, which is especially helpful when internet access is limited.
https://github.com/alphacep/vosk-api
GitHub
GitHub - alphacep/vosk-api: Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and…
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node - alphacep/vosk-api
#python #asr #captions #cli #python #subtitle #subtitles #transcript #transcripts #translating_transcripts #youtube #youtube_api #youtube_asr #youtube_captions #youtube_subtitles #youtube_transcript #youtube_transcripts #youtube_video
The YouTube Transcript API is a tool that helps you get the text from YouTube videos. It's fast and easy to use, saving you time compared to watching the whole video. You can use it to make subtitles, translate text, and even analyze what's being said in videos. This is helpful for content creators who want to make their videos more accessible and for researchers who need to study video content quickly. It also supports multiple languages, making it useful for a wide range of users.
https://github.com/jdepoix/youtube-transcript-api
The YouTube Transcript API is a tool that helps you get the text from YouTube videos. It's fast and easy to use, saving you time compared to watching the whole video. You can use it to make subtitles, translate text, and even analyze what's being said in videos. This is helpful for content creators who want to make their videos more accessible and for researchers who need to study video content quickly. It also supports multiple languages, making it useful for a wide range of users.
https://github.com/jdepoix/youtube-transcript-api
GitHub
GitHub - jdepoix/youtube-transcript-api: This is a python API which allows you to get the transcript/subtitles for a given YouTube…
This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headles...
❤1