whispering
whispering copied to clipboard
Whispering Tiger - OpenAI's whisper (and other models) with OSC and Websocket support. Allowing live transcription / translation in VRChat and Overlays in most Streaming Applications
Whispering Tiger (Live Translate/Transcribe)
Whispering Tiger is a free and Open-Source tool that can listen/watch to any audio stream or in-game image on your machine and prints out the transcription or translation to a web browser using Websockets or over OSC (examples are Streaming-overlays or VRChat).


Content:
- Features
- Plugins
- List of Plugins
- How to create Plugins
- Plugins
- Quickstart
- Release Downloads
- Usage
- Usage with 3rd Party Applications
- VRChat
- Live Streaming Applications (OBS, vMix, XSplit ...)
- Desktop+
- Usage with 3rd Party Applications
- Websocket Clients
- Configurations
- Command-line flags
- Settings file
- Working with the Code
- FAQ
- Sources
Features
- Runs 100% locally on your machine. (Once A.I. Models are downloaded, no further internet connection is required)
- Speech recognition, translation and transcription
- OpenAI's Whisper, Supports ~98 languages
- Meta's Seamless M4T, multi modal, Supports ~101 languages
- Microsoft's Speech T5, English only
- NVIDIA's NeMo Canary, English, Spanish, German, and French
- Wav2Vec Bert 2.0, English and German
- Text translation
- LID [Language Identification] (Supports 200 languages)
- NLLB-200 (single model, Supporting 200 languages, high accuracy)
- M2M-100 (single model, Supporting 100 languages, high accuracy)
- Seamless M4T (single model, multi modal, Supporting ~101 languages)
- OCR [Optical Character Recognition] (to capture game images and translate in-game text)
- EasyOCR (Supports 80+ languages)
- TTS [Text-to-Speech] (Read out transcriptions/translations)
- Silero
- VAD [Voice Activity Detection]
- Silero-VAD
- RVC [Retrieval-based Voice Conversion] (Convert your voice, the voice in audio files or from Text-to-Speech)
- RVC (Using the RVC STS Whispering Tiger Plugin)
- LLM [Large language model] (Continuation of text. automatic answer generation etc.) Proof of concept
- FLAN-T5, GPT-J, Bloomz etc. (Using the Whispering TIger Plugin for LLM)
- And more using other Plugins...
See all available Plugins in the List of Plugins.
Quickstart
For a quick and easy start, download the latest Whispering Tiger UI from here: https://github.com/Sharrnah/whispering-ui
This is a native UI application that allows keeping your Whispering Tiger version up-to-date and manage the settings more easily.
Release Downloads
Standalone Releases with all dependencies included.
Go to the GitHub Releases Page and Download from the download Link in the description or find the Latest Release here.
(because of the 2 GB Limit, no direct release files on GitHub)
- Install CUDA for GPU Acceleration (recommended)
- Extract the Files on a Drive with enough free Space.
- (After download of medium Whisper Model + medium NLLB-200 Translation model, it can take up to 20 GB)
- Run only using the *.bat files. Edit or copy an existing
start-*.batfile and edit the parameters in any text editor for your own command-line flags.start-transcribe-mic.battries to use your default microphone and is a good starting point.
Sources
A thanks goes to
- OpenAI https://github.com/openai/whisper
- Awexander https://github.com/Awexander/audioWhisper
- Blake https://github.com/mallorbc/whisper_mic
- Meta (LID, NLLB-200, M2M-100) https://ai.facebook.com/blog/nllb-200-high-quality-machine-translation/
- Meta (Seamless M4T) https://github.com/facebookresearch/seamless_communication
- faster-whisper https://github.com/guillaumekln/faster-whisper
- EasyOCR https://github.com/jaidedai/easyocr
- Silero (TTS, VAD) https://github.com/snakers4/silero-models