Wav2Vec2ProcessorWithLM can return N best hypotheses now
What does this PR do?
Fixes #22150 , now the user can specify the number of hypotheses which will be returned after the decoding stage. If the specified number is higher than the actual number of hypotheses then all hypotheses will be returned. This is useful when the user wants to run the rescoring on the n-best hypotheses (check out the motivation in the linked issue). Wav2Vec2DecoderWithLMOutput class was already prepared for this feature and this comment in the code said that this feature will be eventually added, so here it is.
I tried not to break anything that relies on the current version of the decode function, the doc string is updated with a new parameter. All tests passed. The code was well-formatted.
Before submitting
- [ x] Did you read the contributor guideline, Pull Request section?
- [ x] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
- [ ] Did you write any new necessary tests? Is this necessary for a such small feature?
@younesbelkada @ArthurZucker @sanchit-gandhi , does this make sense to you, guys? Is there anything else I should add?
The documentation is not available anymore as the PR was closed or merged.
We might have more luck with @sanchit-gandhi ;-)
Thanks a lot for the PR @vsokolovskii,
Just to better understand what happens now in case we decoder a batch of logits with
n_best > 1- > will we return a list of a list of text in this case?Wondering if that's the API that we want - @sanchit-gandhi wdyt?
Take a look at the description of the output class arguments that you have, you've already prepared everything for this change and I just added the return statement. There should be the possibility to get more than one hypothesis from the ASR in order to rescore it with a larger model, take a look at the motivation section in the linked issue. 🤗
@sanchit-gandhi aha... got it. Check out the new changes, please.
Very cool feature @vsokolovskii! Regarding @patrickvonplaten's question about batch decoding, we don't actually have the argument
n_bestfor thebatch_decodemethod, it's only for the single-item,decodemethod. So currently, we'd never be returning batches of n-best hypothesis.WDYT about adding
n_bestto thebatch_decodemethod as well @vsokolovskii? In this case, I think we should match the output format to generate's beam search method as[batches * num_sequences, output_sequences](see https://huggingface.co/docs/transformers/internal/generation_utils#transformers.generation.BeamSearchDecoderOnlyOutput.sequences)
@ArthurZucker @amyeroberts , could you please rerun the tests once the pipeline is fixed, I believe that it's not caused by my changes.
The code quality check not passing is not due to your PR at first glance, but to make sure, could you rebase on main? It has been fixed on the main branch.
The code quality check not passing is not due to your PR at first glance, but to make sure, could you rebase on main? It has been fixed on the main branch.
thanks! forgot yo update my fork