[FEATURE]: Speech-to-Text Voice Input for Lazy People in OpenCode
Feature hasn't been suggested before.
- [x] I have verified this feature I'm about to request hasn't been suggested before.
Describe the enhancement you want to request
Hi! First of all, congratulations on the amazing project.
I've been working on a Speech-to-Text voice input feature that integrates directly into the TUI. It allows users to start audio recording with a keybind, automatically transcribe speech using different providers, and insert the resulting text directly into the prompt.
I've built an initial working version, currently tested only on macOS, and the system includes:
- Real-time audio recording via FFmpeg;
- Support for Groq Whisper, OpenAI Whisper, and local whisper.cpp;
- Automatic microphone/device detection;
- Interactive menus for choosing provider, model, and audio device;
- Persistent configuration stored in ~/.opencode/state/speech.json;
- Customizable keybinds (Ctrl+X v, Ctrl+X P, Ctrl+X D);
- Smooth flow: record → transcribe → insert into prompt input;
Would this be something you'd be interested in integrating into the project?
That sounds cool, did you integrate it using the plugin system, and if not why? Maybe we need to expand it to allow for stuff like this
This ticket inspired me to create this: https://github.com/chuckstack/groq-whisper
This is not nearly as good or integrated as what is described above; however I could not wait for the above to be accepted. I thought you would appreciate seeing a generic tools approach to implementing it.
You can use it from:
- opencode: !groq-whisper # this injects the response directly into the context without the ability to edit prior to injection
- vim: :r groq-whisper # this allows you to ctrl+p => open editor (vim) and capture the text before submitting
- terminal: groq-whisper # can be used outside of opencode
I hope this helps!
edit to above details:
- only tested on debian (see notes for mac)
- only uses groq whisper
would be really awesome !
I was just looking for something like this today, it'd be awesome to see it implemented!
bump for interest!
Hey! Same here! I'd love it!
hi, checkout another implantation #9264 support both whisper model and audio large language model (gpt-4o, qwen3-omni, etc.)