transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Whisper fix audio out of range

Open maxkvbn opened this issue 1 year ago • 3 comments

What does this PR do?

Fixes #31683

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [x] Did you read the contributor guideline, Pull Request section?
  • [x] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case. https://github.com/huggingface/transformers/issues/31683
  • [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • [ ] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

maxkvbn avatar Jul 03 '24 09:07 maxkvbn

@sanchit-gandhi

maxkvbn avatar Jul 03 '24 09:07 maxkvbn

Gentle ping @sanchit-gandhi @kamilakesbi

amyeroberts avatar Jul 29 '24 08:07 amyeroberts

cc @ylacombe

amyeroberts avatar Aug 27 '24 11:08 amyeroberts

Hey @maxkvbn, thanks for opening this PR and for #31683!

Do you think you could send the audio that causes your code to fail? The link to download it is no longer working. Or better yet, an end-to-end snippet to reproduce it. If that's okay with you, I'll review your PR once I can reproduce the error!

ylacombe avatar Sep 03 '24 08:09 ylacombe

Hey @maxkvbn, gentle up!

ylacombe avatar Sep 12 '24 14:09 ylacombe

Hey @ylacombe

I've reuploaded the audio in question: https://fastupload.io/b9ff61358ef5c4b1

maxkvbn avatar Sep 13 '24 07:09 maxkvbn

Thanks for the audio @maxkvbn, I was able to reproduce the issue!

I believe the issue happens when using num_beams>1, which gives a small indication on what could have gone wrong. Let me check a bit more

ylacombe avatar Sep 17 '24 09:09 ylacombe

Hey @maxkvbn, after some deeper investigation, I'm fairly certain the bugs come from the generation itself, and not the tokenization!

I still have to figure out how to actually fix it, but it comes from a recent change in Whisper.generate. I'll ping you once it's fixed!

ylacombe avatar Sep 17 '24 17:09 ylacombe