FastSpeech2 icon indicating copy to clipboard operation
FastSpeech2 copied to clipboard

Attempting!

Open binarythinktank opened this issue 5 years ago • 11 comments

Hi

Trying to get this to work, with a custom sample set. I'm stuck at the end of this: https://github.com/ivanvovk/DurIAN#6-how-to-align-your-own-data

I don't see how to go from the many .TextGrid files to the 2 filelists that the config needs. You mentioned using textgrid in python but I'm not a python dev, do you have a script to convert all of the textgrid files?

Cheers

binarythinktank avatar Sep 11 '20 11:09 binarythinktank

ok, i have figured out how to read the data files:

import textgrid import sys

f = './alligned/wavs3/{}.TextGrid'.format(sys.argv[1]) tg = textgrid.TextGrid.fromFile(f)

for i in tg.tiers[1].intervals: print(i)

which gives me a bunch of these: Interval(0.0, 0.12, sil)

i don't entirely see how this converts to your filelist format though...

looks something like: [original text]|[the first numbers from all of the intervals??]|[the second numbers from all of the intervals??]|[the text from all of the intervals]|[filename]

you say to "convert to frames in the doc", is that just x 100?

binarythinktank avatar Sep 12 '20 02:09 binarythinktank

On https://github.com/carankt/FastSpeech2.git Check the Generate_FileList.ipynb Hope it Helps!

carankt avatar Sep 13 '20 13:09 carankt

many thanks!

binarythinktank avatar Sep 14 '20 00:09 binarythinktank

@carankt tried running the notebook but getting:

ModuleNotFoundError Traceback (most recent call last) in 3 import librosa 4 import textgrid ----> 5 from utils.files import get_files

ModuleNotFoundError: No module named 'utils.files'

binarythinktank avatar Sep 14 '20 06:09 binarythinktank

@binarythinktank clone the repo and then run. The notebook requires a module from the repo located in utils/files

carankt avatar Sep 14 '20 06:09 carankt

@carankt same... (to be clear, i cloned your repo, https://github.com/carankt/FastSpeech2)

binarythinktank avatar Sep 14 '20 06:09 binarythinktank

Take the latest version of the repo. Added the missing module.

carankt avatar Sep 14 '20 06:09 carankt

thanks, now it works. this approach is definitely different to my attempt to pull the relevant data out. Your train.txt file is only 4mb, in my attempt, it somehow ended up being 10Gb....

python3 .\train_fastspeech.py is failing though, I'll need to debug again.

binarythinktank avatar Sep 14 '20 07:09 binarythinktank

nope, i'm stuck again. If anyone has any idea...

2020-09-14 16:28:53,162 - INFO - ngpu: 1 2020-09-14 16:28:53,162 - INFO - random seed = 1 Using cache found in C:\Users\ml/.cache\torch\hub\seungwonpark_melgan_master Model is loaded ... New Training Batch Size : 16 Trainable Parameters: 26.691M Loading train data: 0%| | 0/160 [00:06<?, ?it/s] Traceback (most recent call last): File ".\train_fastspeech.py", line 452, in main(sys.argv[1:]) File ".\train_fastspeech.py", line 448, in main train(args, hp, hp_str, logger, vocoder) File ".\train_fastspeech.py", line 93, in train for data in pbar: File "C:\Users\ml\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\tqdm\std.py", line 1130, in iter for obj in iterable: File "C:\Users\ml\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\utils\data\dataloader.py", line 354, in next data = self._next_data() File "C:\Users\ml\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\utils\data\dataloader.py", line 980, in _next_data return self._process_data(data) File "C:\Users\ml\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\utils\data\dataloader.py", line 1005, in process_data data.reraise() File "C:\Users\ml\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch_utils.py", line 395, in reraise raise self.exc_type(msg) KeyError: Caught KeyError in DataLoader worker process 0. Original Traceback (most recent call last): File "C:\Users\ml\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\utils\data_utils\worker.py", line 185, in worker_loop data = fetcher.fetch(index) File "C:\Users\ml\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\Users\ml\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\utils\data_utils\fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "D:\home\Downloads\FastSpeech2\dataset\dataloader.py", line 51, in getitem x = phonemes_to_sequence(x) File "D:\home\Downloads\FastSpeech2\dataset\texts_init.py", line 175, in phonemes_to_sequence sequence = [phoneme_to_id[s] for s in string] File "D:\home\Downloads\FastSpeech2\dataset\texts_init.py", line 175, in sequence = [_phoneme_to_id[s] for s in string] KeyError: 'ER0'

binarythinktank avatar Sep 14 '20 08:09 binarythinktank

The filelist you generated using my repo uses different symbols than this one. You must make adjustments to the filelist or the type of symbols in this repo to move forward.

carankt avatar Sep 15 '20 04:09 carankt

which symbols? not the separator, both are using the pipe | character?

I will, in parallel, also attempt to use your version.

Have I put the correct values into your config?

wav_path = lab_path = '../melgan/wavs3' (contains the .wav and .lab files) csv_path = '../melgan/texts.csv' (not sure what this should be, I put a csv file I used to create the .lab files, it has [filename without .wav]|[spoken text]\n ) dict_path = 'cmudict.txt' (not sure what this is, is the dictionary file in MFA? mine is called english.dict) output_directory = 'training_log' (created this folder) log_directory = 'fastspeech2-wavs3' (created this folder) data_path = data_dir = '../melgan/data/' (output folder of preprocess from this repo, will it work?) filelist_alignment_dir=teacher_dir = '../melgan/alligned/wavs3' (raw output from mfa, will it work?) training_files='../melgan/filelists/train.txt' (converted from mfa dir by your repo) validation_files='../melgan/filelists/valid.txt' (converted from mfa dir by your repo)

wrong csv_path file format?

Traceback (most recent call last): File "train.py", line 144, in main() File "train.py", line 48, in main train_loader, val_loader, collate_fn = prepare_dataloaders(hparams) File "FastSpeech2\utils\utils.py", line 13, in prepare_dataloaders trainset = TMDPESet(hparams.training_files, hparams) File "FastSpeech2\utils\data_utils.py", line 20, in init self.audiopaths_and_text = load_filepaths_and_text(audiopaths_and_text, hparams.data_path) File "FastSpeech2\utils\data_utils.py", line 12, in load_filepaths_and_text file_name, text1, text2 = line.strip().split('|') ValueError: too many values to unpack (expected 3)

binarythinktank avatar Sep 15 '20 08:09 binarythinktank