Any guidelines for tuning noise_scale_w?
I found that adjusting noise_scale_w has an effect on the smoothness of the synthesized speech When noise_scale_w is close to 1, the speech speed is slower and the speech is more intermittent When noise_scale_w is close to 0, the speech speed is fast and the intonation is flat. Do you have any experience on how to adjust noise_scale_w?
Use translator to read generation params description here: https://github.com/w4123/vits
Thanks very much. Does it mean that we can only try multiple values and listen to the audio to choose the value? Or is there better way to decide?
Also, I've found that some datasets work fine when noise_scale_w=1, but some datasets the synthesized speech is stuttering. Why is this? Does that mean I should train longer?