Training Pipeline + Steps for training TTS

Open m-hamza-mughal opened this issue 5 years ago • 0 comments

Hi, Thanks for this clean and great implementation for MelNet. I'm a beginner in Speech Synthesis so kindly guide me through the steps for training MelNet for TTS: What I know/assume:

Training will be done separately for tiers and for TTS, we'll use the tier flag set to 1 and tts flag set to True
For subsequent tiers, we will set tier flag to 2,3,4,5,6 respectively and tts flag to False.
Finally we will put checkpoints for each tier in inference.yaml and pass it to MelNet class for prediction.

Therefore I have some questions:

Can you provide/confirm the steps to train multiple tiers for the TTS option?
Are we supposed to train TTS (with --tts flag set to True) and keeping tier number = 1?
What do you mean by this in README.md: The -s flag is a boolean for determining whether to train a TTS tier. Since a TTS tier only differs at tier 1, this flag is ignored when [tier number] != 0 .
- And where is this condition in the code which you referred here: [tier number] != 0
- I assume this means we should ignore tts flag in case tier number > 2?
What is the difference between tts arg for trainer and tier number in config file (YAML) and should they be same? If not then what is the difference?
How do we know that our model (for each tier) has converged? What is the minimum train/test loss value we should achieve. What was your training time and on what GPU
Lastly, can we generate Mel outputs from different trained tier models? Like if we have TTS model + some consecutive tier and we can infer the output to check training performance.

Sep 15 '20 11:09 m-hamza-mughal