transformers
transformers copied to clipboard
Uniform kwargs for processors of audio-text models
What does this PR do?
- Uniformizes kwargs for processors of audio-text models.
- An extension of https://github.com/huggingface/transformers/issues/31911
- NOTE: don't review nor merge until this PR is complete: https://github.com/huggingface/transformers/pull/32841
TODO Models:
- [x] Clap
- [x] CLVP
- [x] MusicGen Melody
- [ ] Qwen2 Audio
- [x] Seamless M4T
- [x] SpeechT5
- [x] Wav2Vec2 Bert
TODO tests
- [ ] Add audio-text-specific processor tests
- [ ] Remove unnecessary/duplicated tests
Models with special args (will not be done in this PR):
- PopPiano
Models with weird _in_target_context_manager logic (will not be done in this PR):
- MusicGen
- SpeechToText
- Wav2Vec2
- Wav2Vec2 w/ LM
- Whisper
Fixes # (issue)
- https://github.com/huggingface/transformers/issues/31911
Who can review?
@zucchini-nlp @molbap @yonigozlan
@zucchini-nlp the tests are failing because of this: https://github.com/huggingface/transformers/pull/32921