#python #conformer #modelscope #paraformer #punctuation #pytorch #rnnt #speaker_diarization #speech_recognition #vad
https://github.com/alibaba-damo-academy/FunASR
https://github.com/alibaba-damo-academy/FunASR
GitHub
GitHub - modelscope/FunASR: A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting…
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc. - modelscope/FunASR
#python #automatic_speech_recognition #docker #openai_whisper #speech_recognition #speech_to_text
https://github.com/ahmetoner/whisper-asr-webservice
https://github.com/ahmetoner/whisper-asr-webservice
GitHub
GitHub - ahmetoner/whisper-asr-webservice: OpenAI Whisper ASR Webservice API
OpenAI Whisper ASR Webservice API. Contribute to ahmetoner/whisper-asr-webservice development by creating an account on GitHub.
#python #deep_learning #pytorch #speech #speech_processing #speech_synthesis #text_to_speech #toolkit #tts
https://github.com/DigitalPhonetics/IMS-Toucan
https://github.com/DigitalPhonetics/IMS-Toucan
GitHub
GitHub - DigitalPhonetics/IMS-Toucan: Controllable and fast Text-to-Speech for over 7000 languages!
Controllable and fast Text-to-Speech for over 7000 languages! - DigitalPhonetics/IMS-Toucan
#python #audio #deep_learning #noise_suppression #pytorch #rust #speech #speech_enhancement
https://github.com/Rikorose/DeepFilterNet
https://github.com/Rikorose/DeepFilterNet
GitHub
GitHub - Rikorose/DeepFilterNet: Noise supression using deep filtering
Noise supression using deep filtering. Contribute to Rikorose/DeepFilterNet development by creating an account on GitHub.
❤1
#swift #inference #ios #macos #pretrained_models #speech_recognition #swift #transformers #visionos #watchos #whisper
WhisperKit is a tool that helps your Apple devices recognize speech from audio files or live recordings using OpenAI's Whisper model. It works locally on your device, which means it doesn't need internet connection once set up. To use it, you can add WhisperKit to your Swift project easily through the Swift Package Manager or install a command-line version using Homebrew. This tool is beneficial because it allows you to transcribe audio quickly and efficiently right on your device, making it useful for various applications like voice assistants or transcription services.
https://github.com/argmaxinc/WhisperKit
WhisperKit is a tool that helps your Apple devices recognize speech from audio files or live recordings using OpenAI's Whisper model. It works locally on your device, which means it doesn't need internet connection once set up. To use it, you can add WhisperKit to your Swift project easily through the Swift Package Manager or install a command-line version using Homebrew. This tool is beneficial because it allows you to transcribe audio quickly and efficiently right on your device, making it useful for various applications like voice assistants or transcription services.
https://github.com/argmaxinc/WhisperKit
GitHub
GitHub - argmaxinc/WhisperKit: On-device Speech Recognition for Apple Silicon
On-device Speech Recognition for Apple Silicon. Contribute to argmaxinc/WhisperKit development by creating an account on GitHub.
#python #bert #deep_learning #flax #hacktoberfest #jax #language_model #language_models #machine_learning #model_hub #natural_language_processing #nlp #nlp_library #pretrained_models #python #pytorch #pytorch_transformers #seq2seq #speech_recognition #tensorflow #transformer
The Hugging Face Transformers library provides thousands of pretrained models for various tasks like text, image, and audio processing. These models can be used for tasks such as text classification, image detection, speech recognition, and more. The library supports popular deep learning frameworks like JAX, PyTorch, and TensorFlow, making it easy to switch between them.
The benefit to the user is that you can quickly download and use these pretrained models with just a few lines of code, saving time and computational resources. You can also fine-tune these models on your own datasets and share them with the community. Additionally, the library offers a simple `pipeline` API for immediate use on different inputs, making it user-friendly for both researchers and practitioners. This helps in reducing compute costs and carbon footprint while enabling high-performance results across various machine learning tasks.
https://github.com/huggingface/transformers
The Hugging Face Transformers library provides thousands of pretrained models for various tasks like text, image, and audio processing. These models can be used for tasks such as text classification, image detection, speech recognition, and more. The library supports popular deep learning frameworks like JAX, PyTorch, and TensorFlow, making it easy to switch between them.
The benefit to the user is that you can quickly download and use these pretrained models with just a few lines of code, saving time and computational resources. You can also fine-tune these models on your own datasets and share them with the community. Additionally, the library offers a simple `pipeline` API for immediate use on different inputs, making it user-friendly for both researchers and practitioners. This helps in reducing compute costs and carbon footprint while enabling high-performance results across various machine learning tasks.
https://github.com/huggingface/transformers
GitHub
GitHub - huggingface/transformers: 🤗 Transformers: the model-definition framework for state-of-the-art machine learning models…
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training. - GitHub - huggingface/t...
#jupyter_notebook #computer_vision #deep_learning #drug_discovery #forecasting #large_language_models #mxnet #nlp #paddlepaddle #pytorch #recommender_systems #speech_recognition #speech_synthesis #tensorflow #tensorflow2 #translation
This repository provides top-quality deep learning examples that are easy to train and deploy on NVIDIA GPUs. It includes a wide range of models for computer vision, natural language processing, recommender systems, speech to text, and more. These examples are updated monthly and come in Docker containers with the latest NVIDIA software, ensuring the best performance. The models support multiple GPUs and nodes, and some are optimized for Tensor Cores, which can significantly speed up training. This makes it easier for users to achieve high accuracy and performance in their deep learning projects.
https://github.com/NVIDIA/DeepLearningExamples
This repository provides top-quality deep learning examples that are easy to train and deploy on NVIDIA GPUs. It includes a wide range of models for computer vision, natural language processing, recommender systems, speech to text, and more. These examples are updated monthly and come in Docker containers with the latest NVIDIA software, ensuring the best performance. The models support multiple GPUs and nodes, and some are optimized for Tensor Cores, which can significantly speed up training. This makes it easier for users to achieve high accuracy and performance in their deep learning projects.
https://github.com/NVIDIA/DeepLearningExamples
GitHub
GitHub - NVIDIA/DeepLearningExamples: State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with…
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure. - NVIDIA/DeepLearningExamples
#python #speech_synthesis #text_to_speech #tts
The `edge-tts` module lets you use Microsoft Edge's text-to-speech service in your Python code or through commands. You can install it using `pip install edge-tts`. With this module, you can convert text to speech, change the voice and language, adjust the speech rate, volume, and pitch, and even play back the speech immediately. This is useful because it allows you to easily create audio files from text and customize how they sound, making it handy for various applications like automated announcements or educational tools.
https://github.com/rany2/edge-tts
The `edge-tts` module lets you use Microsoft Edge's text-to-speech service in your Python code or through commands. You can install it using `pip install edge-tts`. With this module, you can convert text to speech, change the voice and language, adjust the speech rate, volume, and pitch, and even play back the speech immediately. This is useful because it allows you to easily create audio files from text and customize how they sound, making it handy for various applications like automated announcements or educational tools.
https://github.com/rany2/edge-tts
GitHub
GitHub - rany2/edge-tts: Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows…
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key - rany2/edge-tts
#python #asr #audio #audio_processing #deep_learning #huggingface #language_model #pytorch #speaker_diarization #speaker_recognition #speaker_verification #speech_enhancement #speech_processing #speech_recognition #speech_separation #speech_to_text #speech_toolkit #speechrecognition #spoken_language_understanding #transformers #voice_recognition
SpeechBrain is an open-source toolkit that helps you quickly develop Conversational AI technologies, such as speech assistants, chatbots, and language models. It uses PyTorch and offers many pre-trained models and tutorials to make it easy to get started. You can train models for various tasks like speech recognition, speaker recognition, and text processing with just a few lines of code. SpeechBrain also supports GPU training, dynamic batching, and integration with HuggingFace models, making it powerful and efficient. This toolkit is beneficial because it simplifies the development process, provides extensive documentation and tutorials, and is highly customizable, making it ideal for research, prototyping, and educational purposes.
https://github.com/speechbrain/speechbrain
SpeechBrain is an open-source toolkit that helps you quickly develop Conversational AI technologies, such as speech assistants, chatbots, and language models. It uses PyTorch and offers many pre-trained models and tutorials to make it easy to get started. You can train models for various tasks like speech recognition, speaker recognition, and text processing with just a few lines of code. SpeechBrain also supports GPU training, dynamic batching, and integration with HuggingFace models, making it powerful and efficient. This toolkit is beneficial because it simplifies the development process, provides extensive documentation and tutorials, and is highly customizable, making it ideal for research, prototyping, and educational purposes.
https://github.com/speechbrain/speechbrain
GitHub
GitHub - speechbrain/speechbrain: A PyTorch-based Speech Toolkit
A PyTorch-based Speech Toolkit. Contribute to speechbrain/speechbrain development by creating an account on GitHub.
#python #audio_generation #audio_synthesis #audioldm #audit #fastspeech2 #hifi_gan #music_generation #naturalspeech2 #singing_voice_conversion #speech_synthesis #text_to_audio #text_to_speech #vall_e #vits #voice_conversion
Amphion is a toolkit for generating audio, music, and speech. It helps researchers and engineers, especially beginners, by providing tools for various tasks like turning text into speech (TTS), singing voice conversion (SVC), and text to audio (TTA). Amphion includes visualizations to help understand how these models work, which is very useful for learning. It also offers different vocoders to produce high-quality audio and evaluation metrics to ensure the generated audio is good. This toolkit is free to use under the MIT License and can be installed easily using Python or Docker. Using Amphion, you can create high-quality audio and music with advanced features, making it a powerful tool for both research and practical applications.
https://github.com/open-mmlab/Amphion
Amphion is a toolkit for generating audio, music, and speech. It helps researchers and engineers, especially beginners, by providing tools for various tasks like turning text into speech (TTS), singing voice conversion (SVC), and text to audio (TTA). Amphion includes visualizations to help understand how these models work, which is very useful for learning. It also offers different vocoders to produce high-quality audio and evaluation metrics to ensure the generated audio is good. This toolkit is free to use under the MIT License and can be installed easily using Python or Docker. Using Amphion, you can create high-quality audio and music with advanced features, making it a powerful tool for both research and practical applications.
https://github.com/open-mmlab/Amphion
GitHub
GitHub - open-mmlab/Amphion: Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support…
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi...
👍1
#python #deep_learning #glow_tts #hifigan #melgan #multi_speaker_tts #python #pytorch #speaker_encoder #speaker_encodings #speech #speech_synthesis #tacotron #text_to_speech #tts #tts_model #vocoder #voice_cloning #voice_conversion #voice_synthesis
The new version of TTS (Text-to-Speech) from Coqui.ai, called TTSv2, is now available with several improvements. It supports 16 languages and has better performance overall. You can fine-tune the models using the provided code and examples. The TTS system can now stream audio with less than 200ms latency, making it very responsive. Additionally, you can use over 1,100 Fairseq models and new features like voice cloning and voice conversion. This update also includes faster inference with the Tortoise model and support for multiple speakers and languages. These enhancements make it easier and more efficient to generate high-quality speech from text.
https://github.com/coqui-ai/TTS
The new version of TTS (Text-to-Speech) from Coqui.ai, called TTSv2, is now available with several improvements. It supports 16 languages and has better performance overall. You can fine-tune the models using the provided code and examples. The TTS system can now stream audio with less than 200ms latency, making it very responsive. Additionally, you can use over 1,100 Fairseq models and new features like voice cloning and voice conversion. This update also includes faster inference with the Tortoise model and support for multiple speakers and languages. These enhancements make it easier and more efficient to generate high-quality speech from text.
https://github.com/coqui-ai/TTS
GitHub
GitHub - coqui-ai/TTS: 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production - coqui-ai/TTS
#python #ai #llm #slm #speech
Ultravox is a fast and advanced AI model that can understand both text and human speech without needing a separate step for speech recognition. It responds quickly, taking only about 150 milliseconds to start processing audio content. This makes it useful for real-time voice conversations. You can try it out through a demo page or by running it locally on your computer. Ultravox also allows you to train it with your own audio data, making it customizable for different languages or specific needs. Overall, Ultravox simplifies and speeds up interactions between humans and AI systems.
https://github.com/fixie-ai/ultravox
Ultravox is a fast and advanced AI model that can understand both text and human speech without needing a separate step for speech recognition. It responds quickly, taking only about 150 milliseconds to start processing audio content. This makes it useful for real-time voice conversations. You can try it out through a demo page or by running it locally on your computer. Ultravox also allows you to train it with your own audio data, making it customizable for different languages or specific needs. Overall, Ultravox simplifies and speeds up interactions between humans and AI systems.
https://github.com/fixie-ai/ultravox
GitHub
GitHub - fixie-ai/ultravox: A fast multimodal LLM for real-time voice
A fast multimodal LLM for real-time voice. Contribute to fixie-ai/ultravox development by creating an account on GitHub.
#python #python #realtime #speech_to_text
RealtimeSTT is a library that converts speech to text in real-time. It listens to your microphone and transcribes what you say immediately. Here are the key benefits It uses advanced models like Faster-Whisper for quick and precise transcription.
- **Voice Activity Detection** You can set a specific word, like "Jarvis," to start the recording.
- **Realtime Transcription** Allows you to adjust settings like sensitivity, model size, and even use GPU for better performance.
Installing it is simple with `pip install RealtimeSTT`, and it includes examples to get you started quickly. This library is great for building voice-controlled applications or any project needing real-time speech-to-text functionality.
https://github.com/KoljaB/RealtimeSTT
RealtimeSTT is a library that converts speech to text in real-time. It listens to your microphone and transcribes what you say immediately. Here are the key benefits It uses advanced models like Faster-Whisper for quick and precise transcription.
- **Voice Activity Detection** You can set a specific word, like "Jarvis," to start the recording.
- **Realtime Transcription** Allows you to adjust settings like sensitivity, model size, and even use GPU for better performance.
Installing it is simple with `pip install RealtimeSTT`, and it includes examples to get you started quickly. This library is great for building voice-controlled applications or any project needing real-time speech-to-text functionality.
https://github.com/KoljaB/RealtimeSTT
GitHub
GitHub - KoljaB/RealtimeSTT: A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake…
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription. - KoljaB/RealtimeSTT
#cplusplus #aarch64 #android #arm32 #asr #cpp #csharp #dotnet #ios #lazarus #linux #macos #mfc #object_pascal #onnx #raspberry_pi #risc_v #speech_to_text #text_to_speech #vits #windows
This tool supports various speech functions like speech recognition, text-to-speech, speaker identification, and more. It works on multiple platforms including Android, iOS, Windows, macOS, and Linux, and supports several programming languages such as C++, Python, JavaScript, and others. You can use it locally or through web assembly, making it versatile and convenient. This benefits you by allowing you to integrate advanced speech capabilities into your projects easily, regardless of the platform or programming language you use.
https://github.com/k2-fsa/sherpa-onnx
This tool supports various speech functions like speech recognition, text-to-speech, speaker identification, and more. It works on multiple platforms including Android, iOS, Windows, macOS, and Linux, and supports several programming languages such as C++, Python, JavaScript, and others. You can use it locally or through web assembly, making it versatile and convenient. This benefits you by allowing you to integrate advanced speech capabilities into your projects easily, regardless of the platform or programming language you use.
https://github.com/k2-fsa/sherpa-onnx
GitHub
GitHub - k2-fsa/sherpa-onnx: Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD…
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr...