Whisper fix audio out of range
What does this PR do?
Fixes #31683
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [x] Did you read the contributor guideline, Pull Request section?
- [x] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case. https://github.com/huggingface/transformers/issues/31683
- [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
- [ ] Did you write any new necessary tests?
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
@sanchit-gandhi
Gentle ping @sanchit-gandhi @kamilakesbi
cc @ylacombe
Hey @maxkvbn, thanks for opening this PR and for #31683!
Do you think you could send the audio that causes your code to fail? The link to download it is no longer working. Or better yet, an end-to-end snippet to reproduce it. If that's okay with you, I'll review your PR once I can reproduce the error!
Hey @maxkvbn, gentle up!
Hey @ylacombe
I've reuploaded the audio in question: https://fastupload.io/b9ff61358ef5c4b1
Thanks for the audio @maxkvbn, I was able to reproduce the issue!
I believe the issue happens when using num_beams>1, which gives a small indication on what could have gone wrong. Let me check a bit more
Hey @maxkvbn, after some deeper investigation, I'm fairly certain the bugs come from the generation itself, and not the tokenization!
I still have to figure out how to actually fix it, but it comes from a recent change in Whisper.generate. I'll ping you once it's fixed!