whisper.cpp Streaming to a buffer or queue

Is there a way to use stream.cpp but stream the output into a buffer or queue. I want to implement a real time speech to text but use the transcription for something in real time. How would I be able to do this? Thank you so much for the help!

Apr 02 '24 04:04 BlakelyP

The short answer is there is no functionality built in to do this. The streaming example is just calling whisper_full and whisper_full_get_segment_text repeatedly and printing the result of the latter. If you want to "stream" the output to a buffer etc., you need to do something different with the result of whisper_full_get_segment_text than print it – for that you need to use your C++ skills. As far as the required domain knowledge goes, you can only really "stream" in real time as the result is being decoded via a greedy search, not the default beam search.

Apr 06 '24 17:04 ephemer

For python, see https://github.com/davabase/whisper_real_time. With the library speech_recognition, it's quite easy.

For C++, I think ffmpeg api should be the best way. Maybe it's easier to re-implemnt speech_recognition in C++.

Jul 25 '24 08:07 playgithub