Kelsey
Kelsey
@gyc666 did fine-tuning work for you? Or did you have to train a model from scratch for your accented english dataset?
Can you link to the fork? I'm also having issues on Apple silicon
I have the same error
LibriTTS dataset is only at 24 kHz so you would need to find a new dataset to re-train at 44k
You can control using emotional words for priming, e.g. ("I am very excited....") but there's no explicit speed control
@kalioiczys Did you manage to find a setting that works? I'm having the same problem of American women coming out sounding British.
I fixed this by editing stft.py line 42: fft_window = pad_center(fft_window, size=filter_length,mode='constant') and similarly on 129: mel_basis = librosa_mel_fn( sr=sampling_rate, n_fft=filter_length, n_mels=n_mel_channels, fmin=mel_fmin, fmax=mel_fmax)
I see, thank you very much! On Fri, Mar 3, 2023 at 7:17 PM Tomoki Hayashi ***@***.***> wrote: > You can simply increase upsample scale here. > > https://github.com/kan-bayashi/ParallelWaveGAN/blob/ffaa99fe77d3b0703e5857177fd9b2ecc18cb0bd/egs/ljspeech/voc1/conf/hifigan.v1.yaml#L38-L39 >...
Some of the datasets used for training are licensed for non-commercial use only