MelNet icon indicating copy to clipboard operation
MelNet copied to clipboard

Training Pipeline + Steps for training TTS

Open m-hamza-mughal opened this issue 5 years ago • 0 comments

Hi, Thanks for this clean and great implementation for MelNet. I'm a beginner in Speech Synthesis so kindly guide me through the steps for training MelNet for TTS: What I know/assume:

  • Training will be done separately for tiers and for TTS, we'll use the tier flag set to 1 and tts flag set to True
  • For subsequent tiers, we will set tier flag to 2,3,4,5,6 respectively and tts flag to False.
  • Finally we will put checkpoints for each tier in inference.yaml and pass it to MelNet class for prediction.

Therefore I have some questions:

  • Can you provide/confirm the steps to train multiple tiers for the TTS option?

  • Are we supposed to train TTS (with --tts flag set to True) and keeping tier number = 1?

  • What do you mean by this in README.md: The -s flag is a boolean for determining whether to train a TTS tier. Since a TTS tier only differs at tier 1, this flag is ignored when [tier number] != 0 .

    • And where is this condition in the code which you referred here: [tier number] != 0
    • I assume this means we should ignore tts flag in case tier number > 2?
  • What is the difference between tts arg for trainer and tier number in config file (YAML) and should they be same? If not then what is the difference?

  • How do we know that our model (for each tier) has converged? What is the minimum train/test loss value we should achieve. What was your training time and on what GPU

  • Lastly, can we generate Mel outputs from different trained tier models? Like if we have TTS model + some consecutive tier and we can infer the output to check training performance.

m-hamza-mughal avatar Sep 15 '20 11:09 m-hamza-mughal