GitHub Trends

#cplusplus #aarch64 #android #arm32 #asr #cpp #csharp #dotnet #ios #lazarus #linux #macos #mfc #object_pascal #onnx #raspberry_pi #risc_v #speech_to_text #text_to_speech #vits #windows

This tool supports various speech functions like speech recognition, text-to-speech, speaker identification, and more. It works on multiple platforms including Android, iOS, Windows, macOS, and Linux, and supports several programming languages such as C++, Python, JavaScript, and others. You can use it locally or through web assembly, making it versatile and convenient. This benefits you by allowing you to integrate advanced speech capabilities into your projects easily, regardless of the platform or programming language you use.

https://github.com/k2-fsa/sherpa-onnx

GitHub

GitHub - k2-fsa/sherpa-onnx: Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD…

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr...

634 views11:30

GitHub Trends

#python #artificial_intelligence #llm #python #real_time #speech_to_text #text_to_speech

FastRTC is a Python library that helps you create real-time audio and video streams using WebRTC or WebSockets. It allows you to turn any Python function into a live stream, making it useful for applications like voice chats or video conferencing. Key features include automatic voice detection, built-in UI support with Gradio, and integration with FastAPI for custom frontends. This library simplifies the process of handling real-time communication, allowing developers to focus on their application logic rather than complex streaming setups.

https://github.com/freddyaboulton/fastrtc

GitHub

GitHub - gradio-app/fastrtc: The python library for real-time communication

The python library for real-time communication. Contribute to gradio-app/fastrtc development by creating an account on GitHub.

658 views12:30

GitHub Trends

#python #asr #automatic_speech_recognition #conformer #e2e_models #production_ready #pytorch #speech_recognition #transformer #whisper

WeNet is a powerful tool for speech recognition that helps turn spoken words into text. It's designed to be easy to use and works well in real-world situations, making it great for businesses and developers. WeNet provides accurate results on many public datasets and is lightweight, meaning it doesn't require a lot of resources to run. This makes it beneficial for users who need reliable speech-to-text functionality without complex setup or maintenance.

https://github.com/wenet-e2e/wenet

GitHub

GitHub - wenet-e2e/wenet: Production First and Production Ready End-to-End Speech Recognition Toolkit

Production First and Production Ready End-to-End Speech Recognition Toolkit - wenet-e2e/wenet

452 views15:30

GitHub Trends

#python #apple_silicon #audio_processing #mlx #multimodal #speech_recognition #speech_synthesis #speech_to_text #text_to_speech #transformers

MLX-Audio is a powerful tool for converting text into speech and speech into new audio. It works well on Apple Silicon devices, like M-series chips, making it fast and efficient. You can choose from different languages and voices, and even adjust how fast the speech is. It also includes a web interface where you can see audio in 3D and play your own files. This tool is helpful for making audiobooks, interactive media, and personal projects because it's easy to use and provides high-quality audio quickly.

https://github.com/Blaizzy/mlx-audio

GitHub

GitHub - Blaizzy/mlx-audio: A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX…

A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon. - Blaizzy/mlx-audio

529 views12:00

GitHub Trends

#python #asr #deeplearning #generative_ai #large_language_models #machine_translation #multimodal #neural_networks #speaker_diariazation #speaker_recognition #speech_synthesis #speech_translation #tts

NVIDIA NeMo is a powerful, easy-to-use platform for building, customizing, and deploying generative AI models like large language models (LLMs), vision language models, and speech AI. It lets you quickly train and fine-tune models using pre-built code and checkpoints, supports the latest model architectures, and works on cloud, data center, or edge environments. NeMo 2.0 is even more flexible and scalable, with Python-based configuration and modular design, making it simple to experiment and scale up. The main benefit is that you can create advanced AI applications faster, with less effort, and at lower cost, while getting high performance and easy deployment options[1][2][3].

https://github.com/NVIDIA/NeMo

GitHub

GitHub - NVIDIA-NeMo/NeMo: A scalable generative AI framework built for researchers and developers working on Large Language Models…

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech) - NVIDIA-NeMo/NeMo

548 views13:00

GitHub Trends

#python #ai #ai_art #art #asset_generator #chatbot #deep_learning #desktop_app #image_generation #mistral #multimodal #privacy #pygame #pyside6 #python #self_hosted #speech_to_text #stable_diffusion #text_to_image #text_to_speech #text_to_speech_app

AI Runner is a tool that lets you use AI on your own computer without needing the internet. It can do many things like **voice chatbots**, **text-to-image** generation, and **image editing**. You can also make AI personalities for more interesting conversations. It runs fast and securely, keeping your data private. To use AI Runner, you need a good computer with a strong GPU, like an NVIDIA RTX 3060 or better. This helps keep your data safe and makes AI tasks faster.

https://github.com/Capsize-Games/airunner

GitHub

GitHub - Capsize-Games/airunner: Offline inference engine for art, real-time voice conversations, LLM powered chatbots and automated…

Offline inference engine for art, real-time voice conversations, LLM powered chatbots and automated workflows - Capsize-Games/airunner

465 views11:30

GitHub Trends

#jupyter_notebook #android #asr #deep_learning #deep_neural_networks #deepspeech #google_speech_to_text #ios #kaldi #offline #privacy #python #raspberry_pi #speaker_identification #speaker_verification #speech_recognition #speech_to_text #speech_to_text_android #stt #voice_recognition #vosk

Vosk is a powerful tool for recognizing speech without needing the internet. It supports over 20 languages and dialects, making it useful for many different users. Vosk is small and efficient, allowing it to work on small devices like smartphones and Raspberry Pi. It can be used for things like chatbots, smart home devices, and creating subtitles for videos. This means users can have private and fast speech recognition anywhere, which is especially helpful when internet access is limited.

https://github.com/alphacep/vosk-api

GitHub

GitHub - alphacep/vosk-api: Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and…

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node - alphacep/vosk-api

447 views13:30

GitHub Trends

#python #audiobook #audiobooks #content_creation #content_creator #epub_converter #kokoro #kokoro_82m #kokoro_tts #media_generation #narrator #speech_synthesis #subtitles #text_to_audio #text_to_speech #tts #voice_synthesis

Abogen is a user-friendly tool that quickly converts ePub, PDF, or text files into natural-sounding audio with synchronized subtitles, perfect for creating audiobooks or voiceovers for social media and other projects. You can customize speech speed, choose or mix voices, generate subtitles by sentence or word, and select various audio and subtitle formats. It supports batch processing with queue mode and lets you save chapters separately or merged. Installation is straightforward on Windows, Mac, and Linux, with options for GPU acceleration. This saves you time and effort in producing high-quality audio content from text files efficiently.

https://github.com/denizsafak/abogen

GitHub

GitHub - denizsafak/abogen: Generate audiobooks from EPUBs, PDFs and text with synchronized captions.

Generate audiobooks from EPUBs, PDFs and text with synchronized captions. - denizsafak/abogen

❤1

446 views13:30

GitHub Trends

#typescript #ai #ai_chatbot #angular #chat #chatbot #chatgpt #cohere #component #files #huggingface #image #nextjs #openai #react #react_chatbot #solid #speech #svelte #vue

Deep Chat is an easy-to-add AI chat tool for your website that connects with popular AI services like ChatGPT and HuggingFace or your own custom APIs using just one line of code. It supports text, voice input, speech-to-text, text-to-speech, file sharing, webcam photos, and audio recording, making conversations more interactive. You can customize everything from avatars to message styles and run small AI models directly in the browser without servers. It works with major web frameworks and offers features like local message storage and focus mode for a modern chat experience. This helps you quickly add a powerful, flexible AI chatbot that fits your needs and improves user engagement.

https://github.com/OvidijusParsiunas/deep-chat

GitHub

GitHub - OvidijusParsiunas/deep-chat: Fully customizable AI chatbot component for your website

Fully customizable AI chatbot component for your website - OvidijusParsiunas/deep-chat

402 views12:30

GitHub Trends

#typescript #accessibility #cross_platform #speech_to_text #tauri_v2

Handy is a free, open-source speech-to-text app that works offline on Windows, macOS, and Linux. You press a shortcut, speak, and your words appear in any text field without sending your voice to the cloud, keeping your data private. It uses advanced models like Whisper and Parakeet for accurate transcription and supports GPU acceleration or CPU-only modes. Handy is simple, privacy-focused, and customizable, making it ideal if you want a secure, extensible tool for converting speech to text without relying on internet services. This helps you type hands-free while protecting your privacy and controlling your data.

https://github.com/cjpais/Handy

GitHub

GitHub - cjpais/Handy: A free, open source, and extensible speech-to-text application that works completely offline.

A free, open source, and extensible speech-to-text application that works completely offline. - cjpais/Handy

462 views11:30

About

Blog

Apps

Platform