whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

can the whisper stream support input audio files? like pcm, wav ... format .

Open yuanconghao opened this issue 3 years ago • 9 comments

yuanconghao avatar Apr 19 '23 11:04 yuanconghao

You can use a tool like ffmpeg or avconv to convert any audio or video format to Whisper! Try something like this:

$ ffmpeg -i video.mp4 -f wav -ar 16000 - | ./main -m path/to/model.ggml.bin -

Note the trailing - on both commands, which instructs ffmpeg to write the wav file to stdout and instructs whisper to read from - on stdin.

gcr avatar Apr 21 '23 15:04 gcr

got it, thanks.

Kimmy @.***> 于2023年4月21日周五 23:02写道:

You can use a tool like ffmpeg or avconv to convert any audio or video format to Whisper! Try something like this:

$ ffmpeg -i video.mp4 -f wav -ar 16000 - | ./main -m path/to/model.ggml.bin -

Note the trailing - on both commands, which instructs ffmpeg to write the wav file to stdout and instructs whisper to read from - on stdin.

— Reply to this email directly, view it on GitHub https://github.com/ggerganov/whisper.cpp/issues/800#issuecomment-1517967677, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADG3KGTKR3LLBJ6SMGPQI5TXCKOQPANCNFSM6AAAAAAXD6K3C4 . You are receiving this because you authored the thread.Message ID: @.***>

yuanconghao avatar Apr 23 '23 02:04 yuanconghao

Sox is also very handy: sox input.wav -r 16000 -b 16 output.wav

franalbani avatar Apr 25 '23 16:04 franalbani

Is there a way to feed audio files into a continuously waiting instance of stream? I've been using main on demand, but it's very slow compared to how stream works. Probablt because this means it has to load the model every time a new audio snipped is ready.

Stream is much better for fast 'on demand' work, except that it's only input option is the microphone, which in my case, is already occupied.

// It seems the server tool is useful for this use case: https://github.com/ggerganov/whisper.cpp/tree/master/examples/server

flatsiedatsie avatar Jan 26 '24 16:01 flatsiedatsie

@slaren would you consider a PR optionaly linking the server/stream executables with libffmpeg/libsox in order to convert the input on the fly and in memory (vs triggering/running an external process) ? Best

WilliamTambellini avatar Apr 02 '24 16:04 WilliamTambellini

That would be up to @ggerganov , but I see no issue with it as long as it is optional.

slaren avatar Apr 02 '24 16:04 slaren

Yup, it could be a good addition

ggerganov avatar Apr 09 '24 15:04 ggerganov

Good. @ggerganov Is libffmpeg the best option or would you prefer another lib (sox, ...) ? Refs: https://github.com/FFmpeg/FFmpeg https://johnvansickle.com/ffmpeg/

WilliamTambellini avatar Apr 26 '24 23:04 WilliamTambellini

Good. @ggerganov Is libffmpeg the best option or would you prefer another lib (sox, ...) ? Refs: https://github.com/FFmpeg/FFmpeg https://johnvansickle.com/ffmpeg/

Why not make compile time options?

r0d0dendr0n avatar Apr 27 '24 01:04 r0d0dendr0n