tjongsma comments

Results 20 comments of


                                            tjongsma

Is it possible support real-time transcription with websockets?

Hi @amdrozdov, I've been using your PR to do some basic streaming, and it seems to work pretty decently! The only issue I have is that on occasion it seems...

Is it possible support real-time transcription with websockets?

Thanks @amdrozdov! I'll try it out, that last part seems tricky to implement.

Is it possible support real-time transcription with websockets?

Alright back again, since it seemed tricky, I disabled VAD and I set batch_size=1 just to see if that would work first. Turns out, it doesn't. The same issue remains....

AttributeError: 'PluginConfig' object has no attribute '_remove_input_padding'. Did you mean: '_remove_input_padding'?

I've got exactly the same problem after running the whisper example using large v3, windows 10, tensorrt-llm 11.0 on a fresh virtualenv.

AttributeError: 'PluginConfig' object has no attribute '_remove_input_padding'. Did you mean: '_remove_input_padding'?

> why do we need to set 'remove_input_padding disable' ? I'm guessing we don't, I'm just following the example. But I've tried for example just manually setting input_padding and removing...

Benchmark faster whisper turbo v3

Just chiming in, I've tried using v3-turbo for streaming and found that it hallucinates more/misses audio more than other faster-whisper models. For example for this 10 second audio clip of...

Benchmark faster whisper turbo v3

> > Just chiming in, I've tried using v3-turbo for streaming and found that it hallucinates more/misses audio more than other faster-whisper models. For example for this 10 second audio...

Benchmark faster whisper turbo v3

> > Just chiming in, I've tried using v3-turbo for streaming and found that it hallucinates more/misses audio more than other faster-whisper models. For example for this 10 second audio...

Medium model output is nonsense for batched pipeline (for short 15s audio clips)

So I'm using it to do live streaming with whisper, hence my wanting to use the medium model for better latency. This means I'm using a 15 second rolling window...

Medium model output is nonsense for batched pipeline (for short 15s audio clips)

Alright, intuitively that makes sense but when I used it with large models it did perform much faster than the unbatched version and gave good results (very similar to the...