transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Wav2Vec2ProcessorWithLM can return N best hypotheses now

Open vsokolovskii opened this issue 2 years ago • 5 comments

What does this PR do?

Fixes #22150 , now the user can specify the number of hypotheses which will be returned after the decoding stage. If the specified number is higher than the actual number of hypotheses then all hypotheses will be returned. This is useful when the user wants to run the rescoring on the n-best hypotheses (check out the motivation in the linked issue). Wav2Vec2DecoderWithLMOutput class was already prepared for this feature and this comment in the code said that this feature will be eventually added, so here it is.

I tried not to break anything that relies on the current version of the decode function, the doc string is updated with a new parameter. All tests passed. The code was well-formatted.

Before submitting

@younesbelkada @ArthurZucker @sanchit-gandhi , does this make sense to you, guys? Is there anything else I should add?

vsokolovskii avatar Mar 17 '23 16:03 vsokolovskii

The documentation is not available anymore as the PR was closed or merged.

We might have more luck with @sanchit-gandhi ;-)

sgugger avatar Mar 20 '23 13:03 sgugger

Thanks a lot for the PR @vsokolovskii,

Just to better understand what happens now in case we decoder a batch of logits with n_best > 1 - > will we return a list of a list of text in this case?

Wondering if that's the API that we want - @sanchit-gandhi wdyt?

Take a look at the description of the output class arguments that you have, you've already prepared everything for this change and I just added the return statement. There should be the possibility to get more than one hypothesis from the ASR in order to rescore it with a larger model, take a look at the motivation section in the linked issue. 🤗

vsokolovskii avatar Mar 21 '23 14:03 vsokolovskii

@sanchit-gandhi aha... got it. Check out the new changes, please.

Very cool feature @vsokolovskii! Regarding @patrickvonplaten's question about batch decoding, we don't actually have the argument n_best for the batch_decode method, it's only for the single-item, decode method. So currently, we'd never be returning batches of n-best hypothesis.

WDYT about adding n_best to the batch_decode method as well @vsokolovskii? In this case, I think we should match the output format to generate's beam search method as [batches * num_sequences, output_sequences] (see https://huggingface.co/docs/transformers/internal/generation_utils#transformers.generation.BeamSearchDecoderOnlyOutput.sequences)

vsokolovskii avatar Mar 25 '23 15:03 vsokolovskii

@ArthurZucker @amyeroberts , could you please rerun the tests once the pipeline is fixed, I believe that it's not caused by my changes.

vsokolovskii avatar Mar 25 '23 15:03 vsokolovskii

The code quality check not passing is not due to your PR at first glance, but to make sure, could you rebase on main? It has been fixed on the main branch.

sgugger avatar Mar 27 '23 13:03 sgugger

The code quality check not passing is not due to your PR at first glance, but to make sure, could you rebase on main? It has been fixed on the main branch.

thanks! forgot yo update my fork

vsokolovskii avatar Mar 27 '23 14:03 vsokolovskii