tjongsma
tjongsma
Hi @amdrozdov, I've been using your PR to do some basic streaming, and it seems to work pretty decently! The only issue I have is that on occasion it seems...
Thanks @amdrozdov! I'll try it out, that last part seems tricky to implement.
Alright back again, since it seemed tricky, I disabled VAD and I set batch_size=1 just to see if that would work first. Turns out, it doesn't. The same issue remains....
I've got exactly the same problem after running the whisper example using large v3, windows 10, tensorrt-llm 11.0 on a fresh virtualenv.
> why do we need to set 'remove_input_padding disable' ? I'm guessing we don't, I'm just following the example. But I've tried for example just manually setting input_padding and removing...
Just chiming in, I've tried using v3-turbo for streaming and found that it hallucinates more/misses audio more than other faster-whisper models. For example for this 10 second audio clip of...
> > Just chiming in, I've tried using v3-turbo for streaming and found that it hallucinates more/misses audio more than other faster-whisper models. For example for this 10 second audio...
> > Just chiming in, I've tried using v3-turbo for streaming and found that it hallucinates more/misses audio more than other faster-whisper models. For example for this 10 second audio...
So I'm using it to do live streaming with whisper, hence my wanting to use the medium model for better latency. This means I'm using a 15 second rolling window...
Alright, intuitively that makes sense but when I used it with large models it did perform much faster than the unbatched version and gave good results (very similar to the...