WaveFlow
WaveFlow copied to clipboard
A PyTorch implementation of "WaveFlow: A Compact Flow-based Model for Raw Audio" (ICML 2020)
WaveFlow: A Compact Flow-based Model for Raw Audio
Update: Pretrained weights are now available. See links below.
This is an unofficial PyTorch implementation of WaveFlow (Ping et al, ICML 2020) model.
The aim for this repo is to provide easy-to-use PyTorch version of WaveFlow as a drop-in alternative to various neural vocoder models used with NVIDIA's Tacotron2 audio processing backend.
Please refer to the official implementation written in PaddlePaddle for the official results.
Setup
-
Clone this repo and install requirements
git clone https://github.com/L0SG/WaveFlow.git cd WaveFlow pip install -r requirements.txt -
Install Apex for mixed-precision training
Train your model
-
Download LJ Speech Data. In this example it's in
data/ -
Make a list of the file names to use for training/testing.
ls data/*.wav | tail -n+10 > train_files.txt ls data/*.wav | head -n10 > test_files.txt-n+10and-n10indicates that this example reserves the first 10 audio clips for model testing. -
Edit the configuration file and train the model.
Below are the example commands using
waveflow-h16-r64-bipartize.jsonnano configs/waveflow-h16-r64-bipartize.json python train.py -c configs/waveflow-h16-r64-bipartize.jsonSingle-node multi-GPU training is automatically enabled with DataParallel (instead of DistributedDataParallel for simplicity).
For mixed precision training, set
"fp16_run": trueon the configuration file.You can load the trained weights from saved checkpoints by providing the path to
checkpoint_pathvariable in the config file.checkpoint_pathaccepts either explicit path, or the parent directory if resuming from averaged weights over multiple checkpoints.Examples
insert
checkpoint_path: "experiments/waveflow-h16-r64-bipartize/waveflow_5000"in the config file then runpython train.py -c configs/waveflow-h16-r64-bipartize.jsonfor loading averaged weights over 10 recent checkpoints, insert
checkpoint_path: "experiments/waveflow-h16-r64-bipartize"in the config file then runpython train.py -a 10 -c configs/waveflow-h16-r64-bipartize.jsonyou can reset the optimizer and training scheduler (and keep the weights) by providing
--warm_startpython train.py --warm_start -c configs/waveflow-h16-r64-bipartize.json -
Synthesize waveform from the trained model.
insert
checkpoint_pathin the config file and use--synthesizetotrain.py. The model generates waveform by looping overtest_files.txt.python train.py --synthesize -c configs/waveflow-h16-r64-bipartize.jsonif
fp16_run: true, the model uses FP16 (half-precision) arithmetic for faster performance (on GPUs equipped with Tensor Cores).
Pretrained Weights
We provide pretrained weights via Google Drive. The models are trained for 5 M steps, then we averaged weights over 20 last checkpoints with -a 20. Audio quality almost matches the original paper.
| Models | Download |
|---|---|
| waveflow-h16-r64-bipartize | Link |
| waveflow-h16-r128-bipartize | Link |
Reference
NVIDIA Tacotron2: https://github.com/NVIDIA/waveglow
NVIDIA WaveGlow: https://github.com/NVIDIA/waveglow
r9y9 wavenet-vocoder: https://github.com/r9y9/wavenet_vocoder
FloWaveNet: https://github.com/ksw0306/FloWaveNet
Parakeet: https://github.com/PaddlePaddle/Parakeet