Yoach Lacombe

Results 10 issues of Yoach Lacombe

This PR aims at integrating Bark, a TTS model, to `transformers`. `Bark` was designed and trained by [Suno-AI team](https://github.com/suno-ai/bark) and is made of 4 main components: - A `semantic model...

In the same line of thoughts as the MMS ASR finetuning, I've created a [repo](https://github.com/ylacombe/finetune-hf-vits) that allows MMS TTS finetuning, and updated the MMS README to reflect it! You can...

CLA Signed

# What does this PR do? @cifkao found an edge case that happens when the input of Whisper.generate is a full silence. This is a simple tentative PR. cc @sanchit-gandhi...

As discussed in #2 and internally with @Zengyi-Qin, this PR adds HF hub compatibility. I left the option to download from the old links, but I can still clean it...

This PR aims to add training compatibility for Musicgen and Musicgen Melody. The main difference with classic cross-entropy is that there a `num_codebooks` labels to predict per timestamp instead of...

# What does this PR do? Stability AI recently open-sourced [Stable Audio 1.0](https://huggingface.co/stabilityai/stable-audio-open-1.0), which can be run using their [toolkit library](https://github.com/Stability-AI/stable-audio-tools/) . Contrarily to most diffusion models, the diffusion process...

### System Info ```Shell - `Accelerate` version: 0.30.1 - Platform: Linux-5.15.0-1048-aws-x86_64-with-glibc2.31 - `accelerate` bash location: /fsx/yoach/env_stable_speech/bin/accelerate - Python version: 3.9.16 - Numpy version: 1.26.4 - PyTorch version (GPU?): 2.1.2+cu121 (True)...

# What does this PR do? - Fixes #31270 that happened because of the wrong text-to-waveform mapping - Fixes mono-gpu failing tests (cf [here](https://github.com/huggingface/transformers/actions/runs/9823944203/job/27122464080)) FastSpeechConformer2 also suffers from faililng `test_multi_gpu_data_parallel_forward`,...

Fixes #32246 There have been many failing tests these past days with Whisper, so I'd probably wait for them to be fixed before merging this PR. --- **What does this...

There's been an effort to introduce voice consistency as explained [here](https://github.com/huggingface/parler-tts/issues/95). It'd be great to open a PR to write about it the README.md or the INFERENCE.md, who would like...

documentation
good first issue