Any guidelines for tuning noise_scale_w?

Open TinaChen95 opened this issue 2 years ago • 2 comments

I found that adjusting noise_scale_w has an effect on the smoothness of the synthesized speech When noise_scale_w is close to 1, the speech speed is slower and the speech is more intermittent When noise_scale_w is close to 0, the speech speed is fast and the intonation is flat. Do you have any experience on how to adjust noise_scale_w?

Jun 11 '23 02:06 TinaChen95

Use translator to read generation params description here: https://github.com/w4123/vits

Jun 12 '23 04:06 nikich340

Thanks very much. Does it mean that we can only try multiple values and listen to the audio to choose the value? Or is there better way to decide?

Also, I've found that some datasets work fine when noise_scale_w=1, but some datasets the synthesized speech is stuttering. Why is this? Does that mean I should train longer?

Jun 15 '23 09:06 TinaChen95