speech_recognition icon indicating copy to clipboard operation
speech_recognition copied to clipboard

Recognizer.recognize_vosk is slow for bigger models

Open InterStella0 opened this issue 3 years ago • 0 comments

Steps to reproduce

  1. On smaller models, the Recognizer.recognize_vosk method is pretty fast, but when I use a bigger model, it slows down significantly.
  2. I copy paste the current Recognizer.recognize_vosk code and modified it slightly by storing KaldiRecognizer into Recognizer similar to the vosk's Model instance.
  3. The code is significantly faster

Expected behaviour

The same speed should be observe

Actual behaviour

A Time taken: 5.882570743560791 # <-- initial Model loaded does not count.
B Time taken: 0.6530005931854248
A Time taken: 0.18399715423583984
B Time taken: 0.5320005416870117
A Time taken: 0.5459988117218018
B Time taken: 1.0760009288787842

Full Code

import os
import time

import speech_recognition as sr
r = sr.Recognizer()


def recognize_vosk(self, audio_data, language='en'):
    from vosk import Model, KaldiRecognizer

    assert isinstance(audio_data, sr.AudioData), "Data must be audio data"

    if not hasattr(self, 'vosk_model'):
        if not os.path.exists("model"):
            return "Please download the model from https://github.com/alphacep/vosk-api/blob/master/doc/models.md and unpack as 'model' in the current folder."

        self.vosk_model = Model("model")

    if not hasattr(self, 'vosk_model_kaldirecognizer'):
        self.vosk_model_kaldirecognizer = rec = KaldiRecognizer(self.vosk_model, 16000)
    else:
        rec = self.vosk_model_kaldirecognizer
        rec.Reset()

    rec.AcceptWaveform(audio_data.get_raw_data(convert_rate=16000, convert_width=2))
    finalRecognition = rec.FinalResult()

    return finalRecognition


def timeit(callback, *args):
    start = time.time()
    result = callback(*args)
    end = time.time()
    return end - start


def on_word(_, audio):
    a_time_taken = timeit(recognize_vosk, r, audio)
    b_time_taken = timeit(r.recognize_vosk, audio)
    print("A Time taken:", a_time_taken)
    print("B Time taken:", b_time_taken)


source = sr.Microphone()
print("Listening...")
r.pause_threshold = 1
back = r.listen_in_background(source, on_word, phrase_time_limit=2)
while True:
    pass

Discussion

It seems like caching KaldiRecognizer and using Reset method speeds up it significantly. So it's an actual issue on the recognizer_vosk code

InterStella0 avatar Sep 14 '22 08:09 InterStella0