hyzhan

Results 3 issues of hyzhan

In addition to rhythm, can this model control the emotion of the audio? Is there an example?

I use pre-trained models and different reference audio, but the resulting audio talks barely change. What could be the reason for this?

Don't need to train a speaker encoder? There seems to be no such step in the code. Do we need to use a pre-trained model?