transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Uniform kwargs for processors of audio-text models

Open leloykun opened this issue 1 year ago • 1 comments

What does this PR do?

  • Uniformizes kwargs for processors of audio-text models.
  • An extension of https://github.com/huggingface/transformers/issues/31911
  • NOTE: don't review nor merge until this PR is complete: https://github.com/huggingface/transformers/pull/32841

TODO Models:

  • [x] Clap
  • [x] CLVP
  • [x] MusicGen Melody
  • [ ] Qwen2 Audio
  • [x] Seamless M4T
  • [x] SpeechT5
  • [x] Wav2Vec2 Bert

TODO tests

  • [ ] Add audio-text-specific processor tests
  • [ ] Remove unnecessary/duplicated tests

Models with special args (will not be done in this PR):

  • PopPiano

Models with weird _in_target_context_manager logic (will not be done in this PR):

  • MusicGen
  • SpeechToText
  • Wav2Vec2
  • Wav2Vec2 w/ LM
  • Whisper

Fixes # (issue)

  • https://github.com/huggingface/transformers/issues/31911

Who can review?

@zucchini-nlp @molbap @yonigozlan

leloykun avatar Aug 21 '24 00:08 leloykun

@zucchini-nlp the tests are failing because of this: https://github.com/huggingface/transformers/pull/32921

leloykun avatar Aug 21 '24 14:08 leloykun