espnet Batch bug in asr interface

Hello! I think you have a bug here, because there is no necessary to use BatchBeamSearch if batch_size==1 https://github.com/espnet/espnet/blob/4d6c76a32e003538b490d253eceaefd079989df1/espnet2/bin/asr_inference.py#L153

Jan 09 '22 11:01 Stasiche

BatchBeamSearch is used to batchfy the multiple hypotheses. It is not about the multiple utterances. @ShigekiKarita, this part is a bit confusing. Can you add some comments to clarify it?

Jan 10 '22 15:01 sw005320

Oh, I see, thank you!)

Jan 13 '22 13:01 Stasiche

I understand "BatchBeamSearch" is for multiple hypothesis. If I want to beamsearch(decoding) with batch input(multiple utterances), what do you recommend?

Jan 20 '22 04:01 wonkyuml

The easy way is just to throw multiple single-utterance jobs to multiple GPUs.

In espnet1, we have a multiple-utterance batch beam search, and it works very well. However, the code becomes very complicated, and also the benefit is not well scaled (See Section 4.5 in https://www.isca-speech.org/archive_v0/Interspeech_2019/pdfs/2860.pdf). So, we do not implement it in espnet2, yet.

I think this is OK for most inference scenarios, but multiple-utterance batch would be useful when using sequence discriminative training (espnet1 implementation was motivated by this). So, we may implement it in the future.

Jan 21 '22 02:01 sw005320

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Apr 16 '22 13:04 stale[bot]

Hi @sw005320, when you mention

In espnet1, we have a multiple-utterance batch beam search, and it works very well.

can you please point to the part of the code or documentation, it would be very much helpful.

Jun 03 '22 07:06 tarun73

This is based on https://github.com/espnet/espnet/blob/b008ac7d58e9ced1a9f8c89cc85ee69d9e9461ab/espnet/nets/pytorch_backend/e2e_asr.py#L406

Jun 03 '22 09:06 sw005320