uttertype icon indicating copy to clipboard operation
uttertype copied to clipboard

[Request for New Feature] Introduce Local Model Like faster-whisper

Open Frank-Z-Chen opened this issue 1 year ago • 3 comments

Hi Dhruvyad,

I love how lightweight uttertype is and it is very handy. Instead of calling Whisper API, I wonder if you plan on introducing local model like faster-whisper for faster processing speed?

Best, Frank

Frank-Z-Chen avatar Apr 27 '24 04:04 Frank-Z-Chen

Hey @Frank-Z-Chen,

Thanks for the suggestion. I've added a sample implementation with a local MLX based Whisper model for macOS. It's ~15 lines of code and should be similar for any local library you wish to use.

You can copy this and replace it with any local whisper library or transcription service you wish, and then simply change the two lines - transcriber used in main.py and its corresponding import.

Let me know if you have any questions.

dhruvyad avatar May 15 '24 01:05 dhruvyad

Hi @dhruvyad ! Newb to python here but learning.

I attempted to change the code to utilize faster-whisper, but I'm stumped. Here's what I tried: main.py

import asyncio
import faster_whisper
from pynput import keyboard
# from transcriber import faster_whisper
from table_interface import ConsoleTable
from key_listener import create_keylistener
from dotenv import load_dotenv
from utils import manual_type
from faster_whisper.transcribe import WhisperModel

async def main():
    load_dotenv()

    transcriber = faster_whisper()
    hotkey = create_keylistener(transcriber)

    keyboard.Listener(on_press=hotkey.press, on_release=hotkey.release).start()
    console_table = ConsoleTable()
    with console_table:
        async for transcription, audio_duration_ms in transcriber.get_transcriptions():
            manual_type(transcription.strip())
            console_table.insert(
                transcription,
                round(0.0001 * audio_duration_ms / 1000, 6),
            )


if __name__ == "__main__":
    asyncio.run(main())

transcriber.py section you highlighted:

class WhisperModel:
    def __init__(self, model_type="distil-medium.en", *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.model = WhisperModel(model_size, device="cuda", compute_type="float16")
        model_size = "large-v3"

    def transcribe_audio(self, audio: io.BytesIO) -> str:
        try:
            with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmpfile:
                tmpfile.write(audio.getvalue())
                transcription = self.model.transcribe(tmpfile.name)["text"]
                os.unlink(tmpfile.name)
            return transcription
        except Exception as e:
            print(f"Encountered Error: {e}")
            return ""

I won't bother printing out the errors, as I'm sure I've got something wrong with the code.

Hopefully you can provide a pointer! Thanks in advance, and thank you so much for this project!

undrwater avatar Mar 16 '25 23:03 undrwater

@undrwater Thanks for checking out this repo. Here's a few things I noticed:

  1. In main.py, you seem to be importing WhisperModel from faster_whisper instead of the WhisperModel class you created in transcriber.py
  2. In transcriber.py, your WhisperModel class doesn't inherit from AudioTranscriber and there's a naming conflict where your class and the imported class from faster_whisper are both called WhisperModel.

In general, I would recommend using LLMs to get through such issues faster. Cursor is a good tool for fixing these bugs and learning python.

dhruvyad avatar Mar 17 '25 09:03 dhruvyad