DEX-TTS
DEX-TTS copied to clipboard
DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variability
DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability
This repository is the official implementation of DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability.
In this repository, we provide steps for running DEX-TTS and GeDEX-TTS.
🙏 We recommend you visit our demo site. 🙏
DEX-TTS is diffusion-based expressive TTS using reference speech. The overall architecture of DEX-TTS is as below:
GeDEX-TTS is the general version of DEX-TTS, which does not use reference speech. The overall architecture of GeDEX-TTS is as below:
Shortcuts
You can find codes, a demo site, and paper links below.
[👉 Demo] [📄 Paper] [💻 DEX-TTS Code] [💻 GeDEX-TTS Code]
ToDo
- [X] Bigvgan vocoder for multi-speaker TTS
- [ ] Multi-gpu training codes
- [ ] LibriTTS & Simpe preprocess recipes
- [ ] Pre-trained weight for DEX-TTS
- [X] Pre-trained weight for GeDEX-TTS
- [ ] Precondition VE & VP
- [ ] Evaluation
Citation
@article{park2024dex,
title={DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability},
author={Park, Hyun Joon and Kim, Jin Sob and Shin, Wooseok and Han, Sung Won},
journal={arXiv preprint arXiv:2406.19135},
year={2024}
}
License
This repository will be released under the MIT license.
Thanks to the open source codebases such as RetNet, FastSpeech2, Grad-TTS, DiT, MaskDiT, and EDM. This repository is built on them.