hi @geneing ! I used the C++ code for infer! but the speed is so slowly mels shape: (80, 500) take times:53.759968996047974 Seven seconds of audio takes about 53 seconds！ my hparams

rnn_dims=256,
fc_dims=128, sparsity_target=0.90, How to optimize the params! thanks!

Mar 21 '19 03:03 maozhiqiang

@maozhiqiang That's unexpectedly slow. On my computer (6 year old laptop) with the same hparams it runs a little slower than real time. Let's check a few things:

Did you run training long enough to prune the weights? Current code should print how pruning is progressing.
I'm assuming you ran convert_model to create a weight file.
Have you compiled the library with optimization? Without at least "-O2" optimization level the code will run very slowly. Surprisingly, "-O3" optimization produces a library that runs ~30% slower than with O2. I get best results with "-O2 -ffast-math"

Sorry about the lack of detailed instructions. I'll get it done...

Mar 21 '19 04:03 geneing

hi @geneing ! Thank you for your reply. the training log is epoch:1801, running loss:211.44811499118805, average loss:1.5547655514057945, current lr:0.0002390525520373693, num_pruned:530824 (0.9%) and used convert_model to convert pth to .bin model I used cmake for complied, how to used -O2 for complie thank your

Mar 21 '19 04:03 maozhiqiang

Run ccmake or cmake-gui . Switch to advanced mode ("t" in ccmake / a checkbox in cmake-gui). Find CMAKE_BUILD_TYPE entry and type RelWithDebInfo. Find CMAKE_CXX_FLAGS_RELWITHDEBINFO and edit to include -ffast-math flag.

You can also set build type from cmake command line: https://cmake.org/pipermail/cmake/2008-March/020347.html

Mar 21 '19 05:03 geneing

I just add SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -g -Wall -O2") SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -Wall -O2") to CmakeList.txt and recomplied and now speed mels shape: (80, 500) 5.093661546707153 Seven seconds of audio takes about 5 seconds！ Is this correct? thank you ！

Mar 21 '19 05:03 maozhiqiang

Sounds right. "Eigen3" library that I use, employs every templating trick to get best performance. When optimized, it's performance is excellent. In debug mode it is super inefficient.

A few more flags to play with "-ffast-math" "-march=native".

Mar 21 '19 05:03 geneing

thank you @geneing ！ but The output is all noise, Is my place wrong！

Mar 21 '19 05:03 maozhiqiang

Make sure that input mel is correct - e.g. use one of the training set inputs.
It may be easier to debug using python library

sys.path.insert(0,'lib/build-src-RelDebInfo') #use correct path to the shared library WaveRNNVocoder...so

import WaveRNNVocoder mel = np.load(fname).T #check that the first dimension is 80. Plot to check that it looks like correct mel wav = vocoder.melToWav(mel) #plot to see what wav looks like

Mar 21 '19 05:03 geneing

thank you!

Mar 21 '19 06:03 maozhiqiang

Hi @geneing ! I used synthesize.py , the outputs is normal,but using the test_wavernnvocoder.py has not worked! sample : test_0_mel.npy_orig.wav.zip I don't know why?

Mar 21 '19 07:03 maozhiqiang

@maozhiqiang Could you, please, attach the mel data you are using as an input. Then I can try to reproduce your problem.

Mar 21 '19 21:03 geneing

@geneing thank you! test mels as follows test_0_mel.npy.zip

Mar 22 '19 00:03 maozhiqiang

@maozhiqiang Works for me with b2f5fc106. Screenshot_20190321_232140

test.zip

Commands:

import numpy as np import librosa import sys sys.path.insert(0,'../WaveRNN-Pytorch/lib/build-src-RelDebInfo')

import WaveRNNVocoder mel=np.load('eval/test_0_mel.npy')

vocoder=WaveRNNVocoder.Vocoder() vocoder.loadWeights('../WaveRNN-Pytorch/model_outputs/model.bin') wav=vocoder.melToWav(mel) plot(wav) librosa.output.write_wav('test.wav', wav, 16000)

The speech is a bit noisy and quiet. Cantonese?

Mar 22 '19 06:03 geneing

@geneing thank you! my result is also noise, I saw the same code！ I don't know why!

Mar 22 '19 10:03 maozhiqiang

I'm not sure how to help you. Here's the weight file I'm using.

model.bin.zip

Mar 23 '19 05:03 geneing

thank you! I will try!

Mar 25 '19 00:03 maozhiqiang

Hi! Thank you for this awesome work!

I successfully trained the Pytorch WaveRNN model:

input_type='bits',
bits=10,
rnn_dims=800,
fc_dims=256,
pad=2,
upsample_factors=(4, 4, 16),
compute_dims=128,
res_out_dims=64*2,
res_blocks=10

Now I am trying to run inference in CPU using the C++ library. I compiled the library and run the convert_model.py, but when I try to run inference I get Aborted (core dumped).

If I use the model weights you shared in the above comment it runs perfectly fine.

Anything I might've missed here?

Thanks for your help :)

May 03 '19 09:05 alexdemartos

@alexdemartos Would it be possible for you to obtain the stack trace when this error happens. You may have to recompile in debug mode and either run with gdb or open the core file. It will make it a lot easier to find the cause.

May 03 '19 16:05 geneing

Hi @geneing ,

thanks for your fast response. Sorry, I am not very experienced with C++ code debugging. I compiled the library in debug mode but I don't really know how to get any detailed info. This is the error from the .so library:

MemoryError: std::bad_alloc

I tried to debug with gdb and the vocoder binary, but it crashes when loading the mel (npy) file (even with your model):

(gdb) run -w model.bin -m mels.npy
Starting program: /home/ubuntu/git/WaveRNN-Pytorch/library/debug/vocoder -w model.bin -m mels.npy
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

Nevertheless, I noticed loading your model gives the following details:

Loading: model.bin
Loading:Conv1d(80, 64, kernel_size=(5,), stride=(1,), bias=False)
Loading:BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_runn
Loading:Conv1d(64, 64, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_runn
Loading:Conv1d(64, 64, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_runn
Loading:Conv1d(64, 64, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_runn
Loading:Conv1d(64, 64, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_runn
Loading:Conv1d(64, 64, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_runn
Loading:Conv1d(64, 64, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_runn
Loading:Conv1d(64, 64, kernel_size=(1,), stride=(1,))
Loading:Stretch2d()
Loading:Stretch2d()
Loading:Conv2d(1, 1, kernel_size=(1, 9), stride=(1, 1), padding=(0, 4), 
Loading:Stretch2d()
Loading:Conv2d(1, 1, kernel_size=(1, 11), stride=(1, 1), padding=(0, 5),
Loading:Stretch2d()
Loading:Conv2d(1, 1, kernel_size=(1, 21), stride=(1, 1), padding=(0, 10)
Loading:Linear(in_features=112, out_features=128, bias=True)
Loading:GRU(128, 128, batch_first=True)
Loading:Linear(in_features=160, out_features=128, bias=True)
Loading:Linear(in_features=128, out_features=512, bias=True)

While loading mine, the last part of the model is not there:

Loading: checkpoints/model.bin
Loading:Conv1d(80, 128, kernel_size=(5,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA

May 04 '19 13:05 alexdemartos

@geneing I tested out the model you have in model_outputs/model.bin.

It's taking me 9.5 seconds to generate 6 seconds of audio. Also the audio output quality is pretty poor sample.wav

What should I expect the performance to be if the model was converted to use cuda?
How can I improve audio quality?

May 15 '19 22:05 acrosson

@acrosson The network in this repo is designed for best performance on CPU - low op count, branching and memory access optimized for pipelined processors. For best performance on GPU you would use something like WaveGlow - no branching, massive op count amortized over thousands of simple compute cores. On my dinky old laptop, I can synthesize 9.1 sec of audio in 8.1 seconds on a single cpu core. There are some opportunities for further optimization still. Either your computer is even slower than mine, or there is something suboptimal with code optimization (correct -O and -march flags are important).

For the sound quality, let's check if it's due to pruning. I observe that the quality drop with pruning is quite sharp past some "critical" pruning fraction. This "critical" fraction depends on the dataset used for training. When training with noisier datasets, I observe that I have to keep more weights after pruning to maintain sound quality.

If you go to your checkpoints/eval directory, you should have wav outputs every 10K steps or so. Listen to the output at around step 40000. If it sounds Ok, then check later steps. The step at which it sounds bad will tell you what fraction of the weights you can prune.

For training with https://github.com/mozilla/TTS/ I can prune up to 90% of the weights with little impact on quality. I applied a FIR filter with a window between 95 and 7600 Hz to the M-AILABS dataset I used for training. Here's an example of speech synthesized from text: https://drive.google.com/open?id=1mrV_1RuKOyZxk4gp_81A7l9FmX9qhPAt

Here's one synthesized from mels: https://drive.google.com/open?id=1T-D3jHrI8tlb9EwohaAEdFP0ddwK7LfJ

May 16 '19 03:05 geneing

@maozhiqiang how to export to C++ inference,thank you

Jul 04 '19 08:07 1105060120

Hi all, When I run "python test_wavernnvocoder.py", I got this error: ImportError: /WaveRNN-Pytorch/library/build/WaveRNNVocoder.so: undefined symbol: PyThread_tss_get Is there anyone who can tell me how to fix it? Thank you very much!

Dec 06 '19 02:12 LifaSun

@geneing hello, i use a hparams like belows,i can get a good result, but the inference time which i take is about 8s, is there any method can speed up inference, thank you.

model parameters

rnn_dims=400,
fc_dims=256,
pad=2,
upsample_factors=(4, 5, 10),
compute_dims=64,
res_out_dims=32*2, 
res_blocks=3,

May 24 '21 08:05 li-xx-5