WaveRNN wavernn loss exploded

Hi, I've trained wavernn on 16-bit corpus with 12.5ms frame shift and 50ms window length in MOL mode. The upsample factor is (5,5,8), and the remaining hyperparameters are unchanged. I got loss exploded at 570k.

| Epoch: 1516/2733 (359/366) | Loss: nan | 3.9 steps/s | Step: 569k | | Generating: 1/2| ████████████████ 36000/36300 | Batch Size: 3 | Gen Rate: 2.5kHz | Gen Time: 14.5276s Traceback (most recent call last): File "train_wavernn.py", line 174, in voc_train_loop(voc_model, loss_func, optimiser, train_set, valid_set, valid_syn_set, lr, total_steps, device, list_train_loss, list_loss) File "train_wavernn.py", line 60, in voc_train_loop hp.voc_target, hp.voc_overlap, paths.voc_output) File "/gen_wavernn.py", line 33, in gen_testset x_hat = model.generate(m, save_str, batched, target, overlap, hp.mu_law) File "/models/fatchord_version.py", line 242, in generate save_wav(output, save_path) File "/utils/dsp.py", line 22, in save_wav librosa.output.write_wav(path, x.astype(np.float32), sr=hp.sample_rate) File "tools/anaconda3/lib/python3.6/site-packages/librosa/output.py", line 225, in write_wav util.valid_audio(y, mono=False) File "/tools/anaconda3/lib/python3.6/site-packages/librosa/util/utils.py", line 170, in valid_audio raise ParameterError('Audio buffer is not finite everywhere') librosa.util.exceptions.ParameterError: Audio buffer is not finite everywhere

And I also find deterioration evaluation during training. The blowing sounds and noise is larger near the loss explosion. Here is the samples. samples.zip Could you give me some advice? Thank you.

Aug 12 '19 08:08 moonnee

This is the loss curve. loss_plot

Aug 12 '19 08:08 moonnee

@moonnee I don't think I've ever seen the loss explode with wavernn - so I'm kinda surprised and have no insight on that yet.

As for the blowy noise - the model can recover from artifacts that appear during training - it depends on the situation.

One thing I would recommend - try training a 9bit model with mulaw first before try MOL. As it's much easier and faster to train.

Aug 14 '19 15:08 fatchord

Try to add more data samples or decrease learning rate if you think you have enough data

May 03 '20 19:05 Endreje

@fatchord is loss value around 5.xxx (based on the graph you provided above) is the most optimal result you achieved? I trained the WaveRNN using my own dataset and the loss values (the latest epochs ones) revolves around this value as well. Is it normal? because the generated wav files sound distorted.

note: I only changed the default sampling rate from 22050 to 16000, referring to my dataset.

Jun 08 '20 10:06 qivaijar