C++ infer
hi @geneing ! I used the C++ code for infer! but the speed is so slowly mels shape: (80, 500) take times:53.759968996047974 Seven seconds of audio takes about 53 seconds! my hparams
rnn_dims=256,
fc_dims=128, sparsity_target=0.90, How to optimize the params! thanks!
@maozhiqiang That's unexpectedly slow. On my computer (6 year old laptop) with the same hparams it runs a little slower than real time. Let's check a few things:
- Did you run training long enough to prune the weights? Current code should print how pruning is progressing.
- I'm assuming you ran convert_model to create a weight file.
- Have you compiled the library with optimization? Without at least "-O2" optimization level the code will run very slowly. Surprisingly, "-O3" optimization produces a library that runs ~30% slower than with O2. I get best results with "-O2 -ffast-math"
Sorry about the lack of detailed instructions. I'll get it done...
hi @geneing ! Thank you for your reply. the training log is epoch:1801, running loss:211.44811499118805, average loss:1.5547655514057945, current lr:0.0002390525520373693, num_pruned:530824 (0.9%) and used convert_model to convert pth to .bin model I used cmake for complied, how to used -O2 for complie thank your
Run ccmake or cmake-gui . Switch to advanced mode ("t" in ccmake / a checkbox in cmake-gui). Find CMAKE_BUILD_TYPE entry and type RelWithDebInfo. Find CMAKE_CXX_FLAGS_RELWITHDEBINFO and edit to include -ffast-math flag.
You can also set build type from cmake command line: https://cmake.org/pipermail/cmake/2008-March/020347.html
I just add SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -g -Wall -O2") SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -Wall -O2") to CmakeList.txt and recomplied and now speed mels shape: (80, 500) 5.093661546707153 Seven seconds of audio takes about 5 seconds! Is this correct? thank you !
Sounds right. "Eigen3" library that I use, employs every templating trick to get best performance. When optimized, it's performance is excellent. In debug mode it is super inefficient.
A few more flags to play with "-ffast-math" "-march=native".
thank you @geneing ! but The output is all noise, Is my place wrong!
- Make sure that input mel is correct - e.g. use one of the training set inputs.
- It may be easier to debug using python library
sys.path.insert(0,'lib/build-src-RelDebInfo') #use correct path to the shared library WaveRNNVocoder...so
import WaveRNNVocoder mel = np.load(fname).T #check that the first dimension is 80. Plot to check that it looks like correct mel wav = vocoder.melToWav(mel) #plot to see what wav looks like
thank you!
Hi @geneing ! I used synthesize.py , the outputs is normal,but using the test_wavernnvocoder.py has not worked! sample : test_0_mel.npy_orig.wav.zip I don't know why?
@maozhiqiang Could you, please, attach the mel data you are using as an input. Then I can try to reproduce your problem.
@geneing thank you! test mels as follows test_0_mel.npy.zip
@maozhiqiang Works for me with b2f5fc106.

Commands:
import numpy as np import librosa import sys sys.path.insert(0,'../WaveRNN-Pytorch/lib/build-src-RelDebInfo')
import WaveRNNVocoder mel=np.load('eval/test_0_mel.npy')
vocoder=WaveRNNVocoder.Vocoder() vocoder.loadWeights('../WaveRNN-Pytorch/model_outputs/model.bin') wav=vocoder.melToWav(mel) plot(wav) librosa.output.write_wav('test.wav', wav, 16000)
The speech is a bit noisy and quiet. Cantonese?
@geneing thank you! my result is also noise, I saw the same code! I don't know why!
thank you! I will try!
Hi! Thank you for this awesome work!
I successfully trained the Pytorch WaveRNN model:
input_type='bits',
bits=10,
rnn_dims=800,
fc_dims=256,
pad=2,
upsample_factors=(4, 4, 16),
compute_dims=128,
res_out_dims=64*2,
res_blocks=10
Now I am trying to run inference in CPU using the C++ library. I compiled the library and run the convert_model.py, but when I try to run inference I get Aborted (core dumped).
If I use the model weights you shared in the above comment it runs perfectly fine.
Anything I might've missed here?
Thanks for your help :)
@alexdemartos Would it be possible for you to obtain the stack trace when this error happens. You may have to recompile in debug mode and either run with gdb or open the core file. It will make it a lot easier to find the cause.
Hi @geneing ,
thanks for your fast response. Sorry, I am not very experienced with C++ code debugging. I compiled the library in debug mode but I don't really know how to get any detailed info. This is the error from the .so library:
MemoryError: std::bad_alloc
I tried to debug with gdb and the vocoder binary, but it crashes when loading the mel (npy) file (even with your model):
(gdb) run -w model.bin -m mels.npy
Starting program: /home/ubuntu/git/WaveRNN-Pytorch/library/debug/vocoder -w model.bin -m mels.npy
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
Nevertheless, I noticed loading your model gives the following details:
Loading: model.bin
Loading:Conv1d(80, 64, kernel_size=(5,), stride=(1,), bias=False)
Loading:BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_runn
Loading:Conv1d(64, 64, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_runn
Loading:Conv1d(64, 64, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_runn
Loading:Conv1d(64, 64, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_runn
Loading:Conv1d(64, 64, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_runn
Loading:Conv1d(64, 64, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_runn
Loading:Conv1d(64, 64, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_runn
Loading:Conv1d(64, 64, kernel_size=(1,), stride=(1,))
Loading:Stretch2d()
Loading:Stretch2d()
Loading:Conv2d(1, 1, kernel_size=(1, 9), stride=(1, 1), padding=(0, 4),
Loading:Stretch2d()
Loading:Conv2d(1, 1, kernel_size=(1, 11), stride=(1, 1), padding=(0, 5),
Loading:Stretch2d()
Loading:Conv2d(1, 1, kernel_size=(1, 21), stride=(1, 1), padding=(0, 10)
Loading:Linear(in_features=112, out_features=128, bias=True)
Loading:GRU(128, 128, batch_first=True)
Loading:Linear(in_features=160, out_features=128, bias=True)
Loading:Linear(in_features=128, out_features=512, bias=True)
While loading mine, the last part of the model is not there:
Loading: checkpoints/model.bin
Loading:Conv1d(80, 128, kernel_size=(5,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA
Loading:Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
Loading:BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_runA
@geneing I tested out the model you have in model_outputs/model.bin.
It's taking me 9.5 seconds to generate 6 seconds of audio. Also the audio output quality is pretty poor sample.wav
- What should I expect the performance to be if the model was converted to use
cuda? - How can I improve audio quality?
@acrosson The network in this repo is designed for best performance on CPU - low op count, branching and memory access optimized for pipelined processors. For best performance on GPU you would use something like WaveGlow - no branching, massive op count amortized over thousands of simple compute cores. On my dinky old laptop, I can synthesize 9.1 sec of audio in 8.1 seconds on a single cpu core. There are some opportunities for further optimization still. Either your computer is even slower than mine, or there is something suboptimal with code optimization (correct -O and -march flags are important).
For the sound quality, let's check if it's due to pruning. I observe that the quality drop with pruning is quite sharp past some "critical" pruning fraction. This "critical" fraction depends on the dataset used for training. When training with noisier datasets, I observe that I have to keep more weights after pruning to maintain sound quality.
If you go to your checkpoints/eval directory, you should have wav outputs every 10K steps or so. Listen to the output at around step 40000. If it sounds Ok, then check later steps. The step at which it sounds bad will tell you what fraction of the weights you can prune.
For training with https://github.com/mozilla/TTS/ I can prune up to 90% of the weights with little impact on quality. I applied a FIR filter with a window between 95 and 7600 Hz to the M-AILABS dataset I used for training. Here's an example of speech synthesized from text: https://drive.google.com/open?id=1mrV_1RuKOyZxk4gp_81A7l9FmX9qhPAt
Here's one synthesized from mels: https://drive.google.com/open?id=1T-D3jHrI8tlb9EwohaAEdFP0ddwK7LfJ
@maozhiqiang how to export to C++ inference,thank you
Hi all, When I run "python test_wavernnvocoder.py", I got this error: ImportError: /WaveRNN-Pytorch/library/build/WaveRNNVocoder.so: undefined symbol: PyThread_tss_get Is there anyone who can tell me how to fix it? Thank you very much!
@geneing hello, i use a hparams like belows,i can get a good result, but the inference time which i take is about 8s, is there any method can speed up inference, thank you.
model parameters
rnn_dims=400,
fc_dims=256,
pad=2,
upsample_factors=(4, 5, 10),
compute_dims=64,
res_out_dims=32*2,
res_blocks=3,