Batch bug in asr interface
Hello! I think you have a bug here, because there is no necessary to use BatchBeamSearch if batch_size==1 https://github.com/espnet/espnet/blob/4d6c76a32e003538b490d253eceaefd079989df1/espnet2/bin/asr_inference.py#L153
BatchBeamSearch is used to batchfy the multiple hypotheses. It is not about the multiple utterances.
@ShigekiKarita, this part is a bit confusing.
Can you add some comments to clarify it?
Oh, I see, thank you!)
I understand "BatchBeamSearch" is for multiple hypothesis. If I want to beamsearch(decoding) with batch input(multiple utterances), what do you recommend?
The easy way is just to throw multiple single-utterance jobs to multiple GPUs.
In espnet1, we have a multiple-utterance batch beam search, and it works very well. However, the code becomes very complicated, and also the benefit is not well scaled (See Section 4.5 in https://www.isca-speech.org/archive_v0/Interspeech_2019/pdfs/2860.pdf). So, we do not implement it in espnet2, yet.
I think this is OK for most inference scenarios, but multiple-utterance batch would be useful when using sequence discriminative training (espnet1 implementation was motivated by this). So, we may implement it in the future.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hi @sw005320, when you mention
In espnet1, we have a multiple-utterance batch beam search, and it works very well.
can you please point to the part of the code or documentation, it would be very much helpful.
This is based on https://github.com/espnet/espnet/blob/b008ac7d58e9ced1a9f8c89cc85ee69d9e9461ab/espnet/nets/pytorch_backend/e2e_asr.py#L406