espnet icon indicating copy to clipboard operation
espnet copied to clipboard

Batch bug in asr interface

Open Stasiche opened this issue 4 years ago • 7 comments

Hello! I think you have a bug here, because there is no necessary to use BatchBeamSearch if batch_size==1 https://github.com/espnet/espnet/blob/4d6c76a32e003538b490d253eceaefd079989df1/espnet2/bin/asr_inference.py#L153

Stasiche avatar Jan 09 '22 11:01 Stasiche

BatchBeamSearch is used to batchfy the multiple hypotheses. It is not about the multiple utterances. @ShigekiKarita, this part is a bit confusing. Can you add some comments to clarify it?

sw005320 avatar Jan 10 '22 15:01 sw005320

Oh, I see, thank you!)

Stasiche avatar Jan 13 '22 13:01 Stasiche

I understand "BatchBeamSearch" is for multiple hypothesis. If I want to beamsearch(decoding) with batch input(multiple utterances), what do you recommend?

wonkyuml avatar Jan 20 '22 04:01 wonkyuml

The easy way is just to throw multiple single-utterance jobs to multiple GPUs.

In espnet1, we have a multiple-utterance batch beam search, and it works very well. However, the code becomes very complicated, and also the benefit is not well scaled (See Section 4.5 in https://www.isca-speech.org/archive_v0/Interspeech_2019/pdfs/2860.pdf). So, we do not implement it in espnet2, yet.

I think this is OK for most inference scenarios, but multiple-utterance batch would be useful when using sequence discriminative training (espnet1 implementation was motivated by this). So, we may implement it in the future.

sw005320 avatar Jan 21 '22 02:01 sw005320

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 16 '22 13:04 stale[bot]

Hi @sw005320, when you mention

In espnet1, we have a multiple-utterance batch beam search, and it works very well.

can you please point to the part of the code or documentation, it would be very much helpful.

tarun73 avatar Jun 03 '22 07:06 tarun73

This is based on https://github.com/espnet/espnet/blob/b008ac7d58e9ced1a9f8c89cc85ee69d9e9461ab/espnet/nets/pytorch_backend/e2e_asr.py#L406

sw005320 avatar Jun 03 '22 09:06 sw005320