ctcdecode icon indicating copy to clipboard operation
ctcdecode copied to clipboard

online beam search inference

Open janvainer opened this issue 5 years ago • 8 comments

Are there any plans to support online inference with beam search?

Usecase: Audio comes in n milisecond chunks and gets processed by an accoustic model that outputs a log_prob frame for each audio chunk. Each frame gets fed to the beam decoder on the fly to produce the next best hypothesis.

janvainer avatar Aug 12 '20 14:08 janvainer

It is doable with the current state of the library even though there isn't explicit support.

This repo does exactly what you're talking about: https://github.com/LearnedVector/A-Hackers-AI-Voice-Assistant

Specifically the code in this subfolder in the following files:

  • demo/demo.py
  • decoder.py
  • engine.py

Hope that helps!

rbracco avatar Aug 12 '20 19:08 rbracco

Thanks! :) I checked out the repo and it seems to me (correct me if I am wrong) that the beam search is recomputed as new chunks arrive, which may be inefficient. Is that correct? is there a way how to give the decoder some initial score and hypotheses matrix to start with?

janvainer avatar Aug 13 '20 07:08 janvainer

It does seem that way... there was functionality integrated to do online beam search in this PR:https://github.com/parlance/ctcdecode/pull/112

I haven't explored this myself, but might be worth checking!

SeanNaren avatar Aug 13 '20 08:08 SeanNaren

Thanks for the link. It looks promissing.i will look into it :)

janvainer avatar Aug 13 '20 15:08 janvainer

Thanks! :) I checked out the repo and it seems to me (correct me if I am wrong) that the beam search is recomputed as new chunks arrive, which may be inefficient. Is that correct? is there a way how to give the decoder some initial score and hypotheses matrix to start with?

You are correct, nice catch and thanks for checking the repo.

rbracco avatar Aug 13 '20 20:08 rbracco

Thanks! :) I checked out the repo and it seems to me (correct me if I am wrong) that the beam search is recomputed as new chunks arrive, which may be inefficient. Is that correct? is there a way how to give the decoder some initial score and hypotheses matrix to start with?

You are correct, nice catch and thanks for checking the repo.

the repo is pretty nice. I do no have time to try running the inference, but would be interested to know if it is able to do somewhat realtime inference :)

janvainer avatar Aug 14 '20 06:08 janvainer

Here's a video from the repo's author showing a demo at the start: https://www.youtube.com/watch?v=YereI6Gn3bM. His settings are using a beam width of 100.

On Fri, Aug 14, 2020 at 2:45 AM LordOfLuck [email protected] wrote:

Thanks! :) I checked out the repo and it seems to me (correct me if I am wrong) that the beam search is recomputed as new chunks arrive, which may be inefficient. Is that correct? is there a way how to give the decoder some initial score and hypotheses matrix to start with?

You are correct, nice catch and thanks for checking the repo.

the repo is pretty nice. I do no have time to try running the inference, but would be interested to know if it is able to do somewhat realtime inference :)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/parlance/ctcdecode/issues/156#issuecomment-673916911, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALIBGAIC62K5ZP4SIR3SLUDSATMQRANCNFSM4P4YQV7Q .

rbracco avatar Aug 15 '20 12:08 rbracco

https://github.com/parlance/ctcdecode/pull/188 Hello i made python bindings for online decoding

stas6626 avatar May 24 '21 11:05 stas6626