online beam search inference
Are there any plans to support online inference with beam search?
Usecase: Audio comes in n milisecond chunks and gets processed by an accoustic model that outputs a log_prob frame for each audio chunk. Each frame gets fed to the beam decoder on the fly to produce the next best hypothesis.
It is doable with the current state of the library even though there isn't explicit support.
This repo does exactly what you're talking about: https://github.com/LearnedVector/A-Hackers-AI-Voice-Assistant
Specifically the code in this subfolder in the following files:
- demo/demo.py
- decoder.py
- engine.py
Hope that helps!
Thanks! :) I checked out the repo and it seems to me (correct me if I am wrong) that the beam search is recomputed as new chunks arrive, which may be inefficient. Is that correct? is there a way how to give the decoder some initial score and hypotheses matrix to start with?
It does seem that way... there was functionality integrated to do online beam search in this PR:https://github.com/parlance/ctcdecode/pull/112
I haven't explored this myself, but might be worth checking!
Thanks for the link. It looks promissing.i will look into it :)
Thanks! :) I checked out the repo and it seems to me (correct me if I am wrong) that the beam search is recomputed as new chunks arrive, which may be inefficient. Is that correct? is there a way how to give the decoder some initial score and hypotheses matrix to start with?
You are correct, nice catch and thanks for checking the repo.
Thanks! :) I checked out the repo and it seems to me (correct me if I am wrong) that the beam search is recomputed as new chunks arrive, which may be inefficient. Is that correct? is there a way how to give the decoder some initial score and hypotheses matrix to start with?
You are correct, nice catch and thanks for checking the repo.
the repo is pretty nice. I do no have time to try running the inference, but would be interested to know if it is able to do somewhat realtime inference :)
Here's a video from the repo's author showing a demo at the start: https://www.youtube.com/watch?v=YereI6Gn3bM. His settings are using a beam width of 100.
On Fri, Aug 14, 2020 at 2:45 AM LordOfLuck [email protected] wrote:
Thanks! :) I checked out the repo and it seems to me (correct me if I am wrong) that the beam search is recomputed as new chunks arrive, which may be inefficient. Is that correct? is there a way how to give the decoder some initial score and hypotheses matrix to start with?
You are correct, nice catch and thanks for checking the repo.
the repo is pretty nice. I do no have time to try running the inference, but would be interested to know if it is able to do somewhat realtime inference :)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/parlance/ctcdecode/issues/156#issuecomment-673916911, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALIBGAIC62K5ZP4SIR3SLUDSATMQRANCNFSM4P4YQV7Q .
https://github.com/parlance/ctcdecode/pull/188 Hello i made python bindings for online decoding