Speech synthesis results

Open athenasaurav opened this issue 3 years ago • 0 comments

Hello @hcy71o ,

Liked your work in Transfer TTS and SC VITS. I have trained a model up to 350000 steps using LibriTTS train clean 100 dataset only but when I synthesize results using some random audio file the speech is not clear.

So, my question is:

How many steps did you train your model?
What should be the length (duration) of audio files while passing to inference.py.
Also should the reference audio be a part of the training data speaker, or can it be unseen?
Do you have any demo page where we can see the comparison of Transfer TTS generated audio with VITS?

Thanks

Dec 10 '22 16:12 athenasaurav