Yoach Lacombe issues

Results 10 issues of


                                            Yoach Lacombe

Add bark

This PR aims at integrating Bark, a TTS model, to `transformers`. `Bark` was designed and trained by [Suno-AI team](https://github.com/suno-ai/bark) and is made of 4 main components: - A `semantic model...

Update MMS README with MMS TTS fine-tuning mention

In the same line of thoughts as the MMS ASR finetuning, I've created a [repo](https://github.com/ylacombe/finetune-hf-vits) that allows MMS TTS finetuning, and updated the MMS README to reflect it! You can...

CLA Signed

Fix WhisperNoSpeechDetection when input is full silence

# What does this PR do? @cifkao found an edge case that happens when the input of Whisper.generate is a full silence. This is a simple tentative PR. cc @sanchit-gandhi...

Add loading from HF hub

As discussed in #2 and internally with @Zengyi-Qin, this PR adds HF hub compatibility. I left the option to download from the old links, but I can still clean it...

Add training compatibility for Musicgen-like models

This PR aims to add training compatibility for Musicgen and Musicgen Melody. The main difference with classic cross-entropy is that there a `num_codebooks` labels to predict per timestamp instead of...

[WIP] Stable Audio integration

# What does this PR do? Stability AI recently open-sourced [Stable Audio 1.0](https://huggingface.co/stabilityai/stable-audio-open-1.0), which can be run using their [toolkit library](https://github.com/Stability-AI/stable-audio-tools/) . Contrarily to most diffusion models, the diffusion process...

Training is slower after using generate on unwrapped model

### System Info ```Shell - `Accelerate` version: 0.30.1 - Platform: Linux-5.15.0-1048-aws-x86_64-with-glibc2.31 - `accelerate` bash location: /fsx/yoach/env_stable_speech/bin/accelerate - Python version: 3.9.16 - Numpy version: 1.26.4 - PyTorch version (GPU?): 2.1.2+cu121 (True)...

Fix some FastSpeechConformer2 failing tests

# What does this PR do? - Fixes #31270 that happened because of the wrong text-to-waveform mapping - Fixes mono-gpu failing tests (cf [here](https://github.com/huggingface/transformers/actions/runs/9823944203/job/27122464080)) FastSpeechConformer2 also suffers from faililng `test_multi_gpu_data_parallel_forward`,...

Correct Whisper's beam search scores computation

Fixes #32246 There have been many failing tests these past days with Whisper, so I'd probably wait for them to be fixed before merging this PR. --- **What does this...

[Documentation Contribution] Voice consistency

There's been an effort to introduce voice consistency as explained [here](https://github.com/huggingface/parler-tts/issues/95). It'd be great to open a PR to write about it the README.md or the INFERENCE.md, who would like...

documentation

good first issue