LEE YOON HYUNG
LEE YOON HYUNG
Hi. I had similar problems when I tried to apply DCA to NVIDIA/tacotron2. (e.g. it can not learn the alignment, output NaN loss) Then, when I use beta=6.3 and grad_thresgold=0.05...
(1) Higher beta rather decelerates the movement of alignment at each decoder step. (average αn/(α + β) step) (2) When I apply noam_lr_scheduling, it seems really helpful for the training.
I think it is quite natural learning multi-speaker TTS is more difficult. Thank you for your reply :D
Do you change the file 'functional.py' in original pytorch package following the description at the bottom of the README.md? I think it can cause the problem. Also, I revise this...
According to the writers of the fastspeech, it is important to use proper alignments in the training. When I implemented transformer-tts at first, I failed to implement it perfectly, and...
Thanks for trying to cheer us up, but we can not still succeed to train the AlignTTS. We are having a hard time training the MDN because of the numerical...
I believe, though I'm not entirely certain, that the 50 refers to the sampling rate of semantic tokens. (50 tok/sec)