hyzhan
Results
3
issues of
hyzhan
In addition to rhythm, can this model control the emotion of the audio? Is there an example?
I use pre-trained models and different reference audio, but the resulting audio talks barely change. What could be the reason for this?
Don't need to train a speaker encoder? There seems to be no such step in the code. Do we need to use a pre-trained model?