Kelsey

Results 9 comments of Kelsey

@gyc666 did fine-tuning work for you? Or did you have to train a model from scratch for your accented english dataset?

Can you link to the fork? I'm also having issues on Apple silicon

LibriTTS dataset is only at 24 kHz so you would need to find a new dataset to re-train at 44k

You can control using emotional words for priming, e.g. ("I am very excited....") but there's no explicit speed control

@kalioiczys Did you manage to find a setting that works? I'm having the same problem of American women coming out sounding British.

I fixed this by editing stft.py line 42: fft_window = pad_center(fft_window, size=filter_length,mode='constant') and similarly on 129: mel_basis = librosa_mel_fn( sr=sampling_rate, n_fft=filter_length, n_mels=n_mel_channels, fmin=mel_fmin, fmax=mel_fmax)

I see, thank you very much! On Fri, Mar 3, 2023 at 7:17 PM Tomoki Hayashi ***@***.***> wrote: > You can simply increase upsample scale here. > > https://github.com/kan-bayashi/ParallelWaveGAN/blob/ffaa99fe77d3b0703e5857177fd9b2ecc18cb0bd/egs/ljspeech/voc1/conf/hifigan.v1.yaml#L38-L39 >...

Some of the datasets used for training are licensed for non-commercial use only