DeepLearningExamples icon indicating copy to clipboard operation
DeepLearningExamples copied to clipboard

Tacotron2 Issues with Inference and using a Custom Dataset

Open conceptofmind opened this issue 4 years ago • 0 comments

I believe I am currently having an issue when training from both scratch and the pre-trained tacotron2 model.

I have collected 14 to 17 hours of pre-processed wav files of Obama speaking. Each file was initially normalized with ffmpeg-normalize and then resampled to the recommended 22050Hz.

I have ensured that:

  • the Sampling rate of each wav file is 22050Hz
  • there is only a Single speaker: Obama
  • the Speech contains a variety of speech phonemes
  • each Audio file is split into segments of 10 seconds
  • each of the Audio segments does NOT have silence at the beginning and end of the file
  • each of the Audio segments does not contain long silences

Here is a link to a drive containing the wav files for inspection:

https://drive.google.com/drive/folders/17RoPoNhcU6ovW0BBkONt3WEXf6ZvuUwF?usp=download

Here is a link to both of the formatted .txt files (train and val):

Train .txt file: https://drive.google.com/file/d/1dxTkagpAT43jP06QAeODWS92GmuqdPqz/view?usp=sharing Validation .txt file: https://drive.google.com/file/d/1dtaHPWTFdXLM1QdOVb2V9H2a_VMKVWRg/view?usp=sharing

I formatted the .txt files in the same way as the LJSpeech dataset. I used wav2vec2.0 for transcriptions. I made sure that any spaces at the start and end of the transcriptions are removed, and that a period was added to the end of each transcript. Each should be on a new line.

The train.py script will run. The directory paths and naming conventions are correct.

This is what a graph of the training inference looks like at epochs 0, 50, and 100:

Epoch 0:

531816681ab45e27dc0e382df3198f71

Epoch 50:

e926113b3eb88b9e4519cf93804bfd0a

Epoch 100:

fc8476aaad5e143b73bb3ca84a536a3f

Epoch 250:

1f0f98d92629c0fff10c00bc73f5641d

Is this how the charts should be looking? Any help would be appreciated!

conceptofmind avatar Jan 27 '22 18:01 conceptofmind