[Request for New Feature] Introduce Local Model Like faster-whisper
Hi Dhruvyad,
I love how lightweight uttertype is and it is very handy. Instead of calling Whisper API, I wonder if you plan on introducing local model like faster-whisper for faster processing speed?
Best, Frank
Hey @Frank-Z-Chen,
Thanks for the suggestion. I've added a sample implementation with a local MLX based Whisper model for macOS. It's ~15 lines of code and should be similar for any local library you wish to use.
You can copy this and replace it with any local whisper library or transcription service you wish, and then simply change the two lines - transcriber used in main.py and its corresponding import.
Let me know if you have any questions.
Hi @dhruvyad ! Newb to python here but learning.
I attempted to change the code to utilize faster-whisper, but I'm stumped. Here's what I tried: main.py
import asyncio
import faster_whisper
from pynput import keyboard
# from transcriber import faster_whisper
from table_interface import ConsoleTable
from key_listener import create_keylistener
from dotenv import load_dotenv
from utils import manual_type
from faster_whisper.transcribe import WhisperModel
async def main():
load_dotenv()
transcriber = faster_whisper()
hotkey = create_keylistener(transcriber)
keyboard.Listener(on_press=hotkey.press, on_release=hotkey.release).start()
console_table = ConsoleTable()
with console_table:
async for transcription, audio_duration_ms in transcriber.get_transcriptions():
manual_type(transcription.strip())
console_table.insert(
transcription,
round(0.0001 * audio_duration_ms / 1000, 6),
)
if __name__ == "__main__":
asyncio.run(main())
transcriber.py section you highlighted:
class WhisperModel:
def __init__(self, model_type="distil-medium.en", *args, **kwargs):
super().__init__(*args, **kwargs)
self.model = WhisperModel(model_size, device="cuda", compute_type="float16")
model_size = "large-v3"
def transcribe_audio(self, audio: io.BytesIO) -> str:
try:
with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmpfile:
tmpfile.write(audio.getvalue())
transcription = self.model.transcribe(tmpfile.name)["text"]
os.unlink(tmpfile.name)
return transcription
except Exception as e:
print(f"Encountered Error: {e}")
return ""
I won't bother printing out the errors, as I'm sure I've got something wrong with the code.
Hopefully you can provide a pointer! Thanks in advance, and thank you so much for this project!
@undrwater Thanks for checking out this repo. Here's a few things I noticed:
- In
main.py, you seem to be importing WhisperModel from faster_whisper instead of the WhisperModel class you created in transcriber.py - In
transcriber.py, yourWhisperModelclass doesn't inherit from AudioTranscriber and there's a naming conflict where your class and the imported class from faster_whisper are both calledWhisperModel.
In general, I would recommend using LLMs to get through such issues faster. Cursor is a good tool for fixing these bugs and learning python.