About generated samples

Open sunnnnnnnny opened this issue 6 years ago • 1 comments

“A PyTorch implementation of Robust Universal Neural Vocoding. Audio samples can be found here.“

The link you gave here is the sample you generated is the actual spectrum feed or the acoustic model predicted?

Jul 16 '19 02:07 sunnnnnnnny

Hi @sunnnnnnnny, the samples on the webpage are generated from the actual mel spectrogram. I haven't had the chance to experiment with something like tacotron yet but the model does seem to work reasonably well on "smoothed" spectrograms. For example, the following spec was reconstructed using an L2 loss and with a vector quantized autoencoder (VQVAE): mel Compared to the original: orig I've attached the audio generated by the reconstucted spectrogram. A little noiser than the original but not too bad (also, the VQVAE may be causing some loss of quality). sample.zip The audio corresponds to the first sample from speaker V002 on the webpage.

Jul 16 '19 08:07 bshall