What does this PR do?

Fixes #22150 , now the user can specify the number of hypotheses which will be returned after the decoding stage. If the specified number is higher than the actual number of hypotheses then all hypotheses will be returned. This is useful when the user wants to run the rescoring on the n-best hypotheses (check out the motivation in the linked issue). Wav2Vec2DecoderWithLMOutput class was already prepared for this feature and this comment in the code said that this feature will be eventually added, so here it is.

I tried not to break anything that relies on the current version of the decode function, the doc string is updated with a new parameter. All tests passed. The code was well-formatted.

Before submitting

[ x] Did you read the contributor guideline, Pull Request section?
[ x] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
[ ] Did you write any new necessary tests? Is this necessary for a such small feature?

@younesbelkada @ArthurZucker @sanchit-gandhi , does this make sense to you, guys? Is there anything else I should add?

Mar 17 '23 16:03 vsokolovskii

The documentation is not available anymore as the PR was closed or merged.

Mar 17 '23 16:03 HuggingFaceDocBuilderDev

We might have more luck with @sanchit-gandhi ;-)

Mar 20 '23 13:03 sgugger

Thanks a lot for the PR @vsokolovskii,

Just to better understand what happens now in case we decoder a batch of logits with n_best > 1 - > will we return a list of a list of text in this case?

Wondering if that's the API that we want - @sanchit-gandhi wdyt?

Take a look at the description of the output class arguments that you have, you've already prepared everything for this change and I just added the return statement. There should be the possibility to get more than one hypothesis from the ASR in order to rescore it with a larger model, take a look at the motivation section in the linked issue. 🤗

Mar 21 '23 14:03 vsokolovskii

@sanchit-gandhi aha... got it. Check out the new changes, please.

Very cool feature @vsokolovskii! Regarding @patrickvonplaten's question about batch decoding, we don't actually have the argument n_best for the batch_decode method, it's only for the single-item, decode method. So currently, we'd never be returning batches of n-best hypothesis.

WDYT about adding n_best to the batch_decode method as well @vsokolovskii? In this case, I think we should match the output format to generate's beam search method as [batches * num_sequences, output_sequences] (see https://huggingface.co/docs/transformers/internal/generation_utils#transformers.generation.BeamSearchDecoderOnlyOutput.sequences)

Mar 25 '23 15:03 vsokolovskii

@ArthurZucker @amyeroberts , could you please rerun the tests once the pipeline is fixed, I believe that it's not caused by my changes.

Mar 25 '23 15:03 vsokolovskii

The code quality check not passing is not due to your PR at first glance, but to make sure, could you rebase on main? It has been fixed on the main branch.

Mar 27 '23 13:03 sgugger

The code quality check not passing is not due to your PR at first glance, but to make sure, could you rebase on main? It has been fixed on the main branch.

thanks! forgot yo update my fork

Mar 27 '23 14:03 vsokolovskii

Wav2Vec2ProcessorWithLM can return N best hypotheses now

What does this PR do?

Before submitting