FastSpeech2 My generated outputs all have a beeping sound, althought the alignment is correct.

I have been training on my own custom data for a while now. I used an aligner and the alignment seems to be working fine. I added the TextGrid to the model and trained for around 2 hours using GPU (I have around 40 minutes of Augmented Data), but all of my synthesized outputs come out as beeps. Any idea what to do to solve the issue. Should I be using more data? More training time? Is my data bad?

May 10 '22 22:05 wolfassi123

@wolfassi123 Did you able to fix it? I am also getting beeping sound with training a model on LJSpeech dataset.

Jun 09 '22 13:06 samin9796

Did you able to fix it? I faced the same problem @wolfassi123 @samin9796

Jun 19 '22 14:06 zaynabmu

@zaynabmu @samin9796 Did you fix it?I face this problem when i use the frame level features of pitch and energy.The quality of synthesized audio (including train and val data) is good in the trainning phase,but the quality of audio synthesized in inferencing phase is bad.

Mar 12 '23 13:03 hhm853610070