GitHub Trends
10.1K subscribers
15.3K links
See what the GitHub community is most excited about today.

A bot automatically fetches new repositories from https://github.com/trending and sends them to the channel.

Author and maintainer: https://github.com/katursis
Download Telegram
#python #asr #audio #audio_processing #deep_learning #huggingface #language_model #pytorch #speaker_diarization #speaker_recognition #speaker_verification #speech_enhancement #speech_processing #speech_recognition #speech_separation #speech_to_text #speech_toolkit #speechrecognition #spoken_language_understanding #transformers #voice_recognition

SpeechBrain is an open-source toolkit that helps you quickly develop Conversational AI technologies, such as speech assistants, chatbots, and language models. It uses PyTorch and offers many pre-trained models and tutorials to make it easy to get started. You can train models for various tasks like speech recognition, speaker recognition, and text processing with just a few lines of code. SpeechBrain also supports GPU training, dynamic batching, and integration with HuggingFace models, making it powerful and efficient. This toolkit is beneficial because it simplifies the development process, provides extensive documentation and tutorials, and is highly customizable, making it ideal for research, prototyping, and educational purposes.

https://github.com/speechbrain/speechbrain
#python #python #realtime #speech_to_text

RealtimeSTT is a library that converts speech to text in real-time. It listens to your microphone and transcribes what you say immediately. Here are the key benefits It uses advanced models like Faster-Whisper for quick and precise transcription.
- **Voice Activity Detection** You can set a specific word, like "Jarvis," to start the recording.
- **Realtime Transcription** Allows you to adjust settings like sensitivity, model size, and even use GPU for better performance.

Installing it is simple with `pip install RealtimeSTT`, and it includes examples to get you started quickly. This library is great for building voice-controlled applications or any project needing real-time speech-to-text functionality.

https://github.com/KoljaB/RealtimeSTT
#cplusplus #aarch64 #android #arm32 #asr #cpp #csharp #dotnet #ios #lazarus #linux #macos #mfc #object_pascal #onnx #raspberry_pi #risc_v #speech_to_text #text_to_speech #vits #windows

This tool supports various speech functions like speech recognition, text-to-speech, speaker identification, and more. It works on multiple platforms including Android, iOS, Windows, macOS, and Linux, and supports several programming languages such as C++, Python, JavaScript, and others. You can use it locally or through web assembly, making it versatile and convenient. This benefits you by allowing you to integrate advanced speech capabilities into your projects easily, regardless of the platform or programming language you use.

https://github.com/k2-fsa/sherpa-onnx
#python #artificial_intelligence #llm #python #real_time #speech_to_text #text_to_speech

FastRTC is a Python library that helps you create real-time audio and video streams using WebRTC or WebSockets. It allows you to turn any Python function into a live stream, making it useful for applications like voice chats or video conferencing. Key features include automatic voice detection, built-in UI support with Gradio, and integration with FastAPI for custom frontends. This library simplifies the process of handling real-time communication, allowing developers to focus on their application logic rather than complex streaming setups.

https://github.com/freddyaboulton/fastrtc
#python #apple_silicon #audio_processing #mlx #multimodal #speech_recognition #speech_synthesis #speech_to_text #text_to_speech #transformers

MLX-Audio is a powerful tool for converting text into speech and speech into new audio. It works well on Apple Silicon devices, like M-series chips, making it fast and efficient. You can choose from different languages and voices, and even adjust how fast the speech is. It also includes a web interface where you can see audio in 3D and play your own files. This tool is helpful for making audiobooks, interactive media, and personal projects because it's easy to use and provides high-quality audio quickly.

https://github.com/Blaizzy/mlx-audio
#python #ai #ai_art #art #asset_generator #chatbot #deep_learning #desktop_app #image_generation #mistral #multimodal #privacy #pygame #pyside6 #python #self_hosted #speech_to_text #stable_diffusion #text_to_image #text_to_speech #text_to_speech_app

AI Runner is a tool that lets you use AI on your own computer without needing the internet. It can do many things like **voice chatbots**, **text-to-image** generation, and **image editing**. You can also make AI personalities for more interesting conversations. It runs fast and securely, keeping your data private. To use AI Runner, you need a good computer with a strong GPU, like an NVIDIA RTX 3060 or better. This helps keep your data safe and makes AI tasks faster.

https://github.com/Capsize-Games/airunner
#jupyter_notebook #android #asr #deep_learning #deep_neural_networks #deepspeech #google_speech_to_text #ios #kaldi #offline #privacy #python #raspberry_pi #speaker_identification #speaker_verification #speech_recognition #speech_to_text #speech_to_text_android #stt #voice_recognition #vosk

Vosk is a powerful tool for recognizing speech without needing the internet. It supports over 20 languages and dialects, making it useful for many different users. Vosk is small and efficient, allowing it to work on small devices like smartphones and Raspberry Pi. It can be used for things like chatbots, smart home devices, and creating subtitles for videos. This means users can have private and fast speech recognition anywhere, which is especially helpful when internet access is limited.

https://github.com/alphacep/vosk-api
#typescript #accessibility #cross_platform #speech_to_text #tauri_v2

Handy is a free, open-source speech-to-text app that works offline on Windows, macOS, and Linux. You press a shortcut, speak, and your words appear in any text field without sending your voice to the cloud, keeping your data private. It uses advanced models like Whisper and Parakeet for accurate transcription and supports GPU acceleration or CPU-only modes. Handy is simple, privacy-focused, and customizable, making it ideal if you want a secure, extensible tool for converting speech to text without relying on internet services. This helps you type hands-free while protecting your privacy and controlling your data.

https://github.com/cjpais/Handy