vits icon indicating copy to clipboard operation
vits copied to clipboard

Questions about the 44KHz audio file train.

Open rlarhk147 opened this issue 4 years ago • 3 comments

Hello,

I trained at 44KHz for a higher quality VC because the results were good when I trained with VCTK 22KHz.

At this time, the result of TTS inference was to read the text very quickly.

Regarding the above phenomenon, can you tell me if there are any parameters I need to adjust when learning 44KHz voice rather than 22KHz voice?

rlarhk147 avatar Jul 19 '21 02:07 rlarhk147

You probably need to set parameter „sampling_rate“ in the data section of the config file during training.

Anyway, human speaking voice has no relevant information above 8Khz so in my humble opinion a sampling rate of 22KHz is sufficient.

domcross avatar Aug 29 '21 12:08 domcross

Hello,

I trained at 44KHz for a higher quality VC because the results were good when I trained with VCTK 22KHz.

At this time, the result of TTS inference was to read the text very quickly.

Regarding the above phenomenon, can you tell me if there are any parameters I need to adjust when learning 44KHz voice rather than 22KHz voice?

What batch size did you use and how much VRAM did it cost to your GPU?

nikich340 avatar Nov 23 '21 16:11 nikich340

@rlarhk147 How was your training with 44 KHz audio training file? Did it produce a good result?

tuannvhust avatar Oct 05 '22 04:10 tuannvhust