Yoach Lacombe
Yoach Lacombe
Add bark
This PR aims at integrating Bark, a TTS model, to `transformers`. `Bark` was designed and trained by [Suno-AI team](https://github.com/suno-ai/bark) and is made of 4 main components: - A `semantic model...
In the same line of thoughts as the MMS ASR finetuning, I've created a [repo](https://github.com/ylacombe/finetune-hf-vits) that allows MMS TTS finetuning, and updated the MMS README to reflect it! You can...
# What does this PR do? @cifkao found an edge case that happens when the input of Whisper.generate is a full silence. This is a simple tentative PR. cc @sanchit-gandhi...
As discussed in #2 and internally with @Zengyi-Qin, this PR adds HF hub compatibility. I left the option to download from the old links, but I can still clean it...
This PR aims to add training compatibility for Musicgen and Musicgen Melody. The main difference with classic cross-entropy is that there a `num_codebooks` labels to predict per timestamp instead of...
# What does this PR do? Stability AI recently open-sourced [Stable Audio 1.0](https://huggingface.co/stabilityai/stable-audio-open-1.0), which can be run using their [toolkit library](https://github.com/Stability-AI/stable-audio-tools/) . Contrarily to most diffusion models, the diffusion process...
### System Info ```Shell - `Accelerate` version: 0.30.1 - Platform: Linux-5.15.0-1048-aws-x86_64-with-glibc2.31 - `accelerate` bash location: /fsx/yoach/env_stable_speech/bin/accelerate - Python version: 3.9.16 - Numpy version: 1.26.4 - PyTorch version (GPU?): 2.1.2+cu121 (True)...
# What does this PR do? - Fixes #31270 that happened because of the wrong text-to-waveform mapping - Fixes mono-gpu failing tests (cf [here](https://github.com/huggingface/transformers/actions/runs/9823944203/job/27122464080)) FastSpeechConformer2 also suffers from faililng `test_multi_gpu_data_parallel_forward`,...
Fixes #32246 There have been many failing tests these past days with Whisper, so I'd probably wait for them to be fixed before merging this PR. --- **What does this...
There's been an effort to introduce voice consistency as explained [here](https://github.com/huggingface/parler-tts/issues/95). It'd be great to open a PR to write about it the README.md or the INFERENCE.md, who would like...