FastSpeech2 Attempting!

Hi

Trying to get this to work, with a custom sample set. I'm stuck at the end of this: https://github.com/ivanvovk/DurIAN#6-how-to-align-your-own-data

I don't see how to go from the many .TextGrid files to the 2 filelists that the config needs. You mentioned using textgrid in python but I'm not a python dev, do you have a script to convert all of the textgrid files?

Cheers

Sep 11 '20 11:09 binarythinktank

ok, i have figured out how to read the data files:

import textgrid import sys

f = './alligned/wavs3/{}.TextGrid'.format(sys.argv[1]) tg = textgrid.TextGrid.fromFile(f)

for i in tg.tiers[1].intervals: print(i)

which gives me a bunch of these: Interval(0.0, 0.12, sil)

i don't entirely see how this converts to your filelist format though...

looks something like: [original text]|[the first numbers from all of the intervals??]|[the second numbers from all of the intervals??]|[the text from all of the intervals]|[filename]

you say to "convert to frames in the doc", is that just x 100?

Sep 12 '20 02:09 binarythinktank

On https://github.com/carankt/FastSpeech2.git Check the Generate_FileList.ipynb Hope it Helps!

Sep 13 '20 13:09 carankt

many thanks!

Sep 14 '20 00:09 binarythinktank

@carankt tried running the notebook but getting:

ModuleNotFoundError Traceback (most recent call last) in 3 import librosa 4 import textgrid ----> 5 from utils.files import get_files

ModuleNotFoundError: No module named 'utils.files'

Sep 14 '20 06:09 binarythinktank

@binarythinktank clone the repo and then run. The notebook requires a module from the repo located in utils/files

Sep 14 '20 06:09 carankt

@carankt same... (to be clear, i cloned your repo, https://github.com/carankt/FastSpeech2)

Sep 14 '20 06:09 binarythinktank

Take the latest version of the repo. Added the missing module.

Sep 14 '20 06:09 carankt

thanks, now it works. this approach is definitely different to my attempt to pull the relevant data out. Your train.txt file is only 4mb, in my attempt, it somehow ended up being 10Gb....

python3 .\train_fastspeech.py is failing though, I'll need to debug again.

Sep 14 '20 07:09 binarythinktank

nope, i'm stuck again. If anyone has any idea...

2020-09-14 16:28:53,162 - INFO - ngpu: 1 2020-09-14 16:28:53,162 - INFO - random seed = 1 Using cache found in C:\Users\ml/.cache\torch\hub\seungwonpark_melgan_master Model is loaded ... New Training Batch Size : 16 Trainable Parameters: 26.691M Loading train data: 0%| | 0/160 [00:06<?, ?it/s] Traceback (most recent call last): File ".\train_fastspeech.py", line 452, in main(sys.argv[1:]) File ".\train_fastspeech.py", line 448, in main train(args, hp, hp_str, logger, vocoder) File ".\train_fastspeech.py", line 93, in train for data in pbar: File "C:\Users\ml\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tqdm\std.py", line 1130, in iter for obj in iterable: File "C:\Users\ml\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\utils\data\dataloader.py", line 354, in next data = self._next_data() File "C:\Users\ml\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\utils\data\dataloader.py", line 980, in _next_data return self._process_data(data) File "C:\Users\ml\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\utils\data\dataloader.py", line 1005, in process_data data.reraise() File "C:\Users\ml\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch_utils.py", line 395, in reraise raise self.exc_type(msg) KeyError: Caught KeyError in DataLoader worker process 0. Original Traceback (most recent call last): File "C:\Users\ml\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\utils\data_utils\worker.py", line 185, in worker_loop data = fetcher.fetch(index) File "C:\Users\ml\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\Users\ml\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\utils\data_utils\fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "D:\home\Downloads\FastSpeech2\dataset\dataloader.py", line 51, in getitem x = phonemes_to_sequence(x) File "D:\home\Downloads\FastSpeech2\dataset\texts_init.py", line 175, in phonemes_to_sequence sequence = [phoneme_to_id[s] for s in string] File "D:\home\Downloads\FastSpeech2\dataset\texts_init.py", line 175, in sequence = [_phoneme_to_id[s] for s in string] KeyError: 'ER0'

Sep 14 '20 08:09 binarythinktank

The filelist you generated using my repo uses different symbols than this one. You must make adjustments to the filelist or the type of symbols in this repo to move forward.

Sep 15 '20 04:09 carankt

which symbols? not the separator, both are using the pipe | character?

I will, in parallel, also attempt to use your version.

Have I put the correct values into your config?

wav_path = lab_path = '../melgan/wavs3' (contains the .wav and .lab files) csv_path = '../melgan/texts.csv' (not sure what this should be, I put a csv file I used to create the .lab files, it has [filename without .wav]|[spoken text]\n ) dict_path = 'cmudict.txt' (not sure what this is, is the dictionary file in MFA? mine is called english.dict) output_directory = 'training_log' (created this folder) log_directory = 'fastspeech2-wavs3' (created this folder) data_path = data_dir = '../melgan/data/' (output folder of preprocess from this repo, will it work?) filelist_alignment_dir=teacher_dir = '../melgan/alligned/wavs3' (raw output from mfa, will it work?) training_files='../melgan/filelists/train.txt' (converted from mfa dir by your repo) validation_files='../melgan/filelists/valid.txt' (converted from mfa dir by your repo)

wrong csv_path file format?

Traceback (most recent call last): File "train.py", line 144, in main() File "train.py", line 48, in main train_loader, val_loader, collate_fn = prepare_dataloaders(hparams) File "FastSpeech2\utils\utils.py", line 13, in prepare_dataloaders trainset = TMDPESet(hparams.training_files, hparams) File "FastSpeech2\utils\data_utils.py", line 20, in init self.audiopaths_and_text = load_filepaths_and_text(audiopaths_and_text, hparams.data_path) File "FastSpeech2\utils\data_utils.py", line 12, in load_filepaths_and_text file_name, text1, text2 = line.strip().split('|') ValueError: too many values to unpack (expected 3)

Sep 15 '20 08:09 binarythinktank