Attempting!
Hi
Trying to get this to work, with a custom sample set. I'm stuck at the end of this: https://github.com/ivanvovk/DurIAN#6-how-to-align-your-own-data
I don't see how to go from the many .TextGrid files to the 2 filelists that the config needs. You mentioned using textgrid in python but I'm not a python dev, do you have a script to convert all of the textgrid files?
Cheers
ok, i have figured out how to read the data files:
import textgrid import sys
f = './alligned/wavs3/{}.TextGrid'.format(sys.argv[1]) tg = textgrid.TextGrid.fromFile(f)
for i in tg.tiers[1].intervals: print(i)
which gives me a bunch of these: Interval(0.0, 0.12, sil)
i don't entirely see how this converts to your filelist format though...
looks something like: [original text]|[the first numbers from all of the intervals??]|[the second numbers from all of the intervals??]|[the text from all of the intervals]|[filename]
you say to "convert to frames in the doc", is that just x 100?
On https://github.com/carankt/FastSpeech2.git
Check the Generate_FileList.ipynb
Hope it Helps!
many thanks!
@carankt tried running the notebook but getting:
ModuleNotFoundError Traceback (most recent call last)
ModuleNotFoundError: No module named 'utils.files'
@binarythinktank clone the repo and then run. The notebook requires a module from the repo located in utils/files
@carankt same... (to be clear, i cloned your repo, https://github.com/carankt/FastSpeech2)
Take the latest version of the repo. Added the missing module.
thanks, now it works. this approach is definitely different to my attempt to pull the relevant data out. Your train.txt file is only 4mb, in my attempt, it somehow ended up being 10Gb....
python3 .\train_fastspeech.py is failing though, I'll need to debug again.
nope, i'm stuck again. If anyone has any idea...
2020-09-14 16:28:53,162 - INFO - ngpu: 1
2020-09-14 16:28:53,162 - INFO - random seed = 1
Using cache found in C:\Users\ml/.cache\torch\hub\seungwonpark_melgan_master
Model is loaded ...
New Training
Batch Size : 16
Trainable Parameters: 26.691M
Loading train data: 0%| | 0/160 [00:06<?, ?it/s]
Traceback (most recent call last):
File ".\train_fastspeech.py", line 452, in
The filelist you generated using my repo uses different symbols than this one. You must make adjustments to the filelist or the type of symbols in this repo to move forward.
which symbols? not the separator, both are using the pipe | character?
I will, in parallel, also attempt to use your version.
Have I put the correct values into your config?
wav_path = lab_path = '../melgan/wavs3' (contains the .wav and .lab files) csv_path = '../melgan/texts.csv' (not sure what this should be, I put a csv file I used to create the .lab files, it has [filename without .wav]|[spoken text]\n ) dict_path = 'cmudict.txt' (not sure what this is, is the dictionary file in MFA? mine is called english.dict) output_directory = 'training_log' (created this folder) log_directory = 'fastspeech2-wavs3' (created this folder) data_path = data_dir = '../melgan/data/' (output folder of preprocess from this repo, will it work?) filelist_alignment_dir=teacher_dir = '../melgan/alligned/wavs3' (raw output from mfa, will it work?) training_files='../melgan/filelists/train.txt' (converted from mfa dir by your repo) validation_files='../melgan/filelists/valid.txt' (converted from mfa dir by your repo)
wrong csv_path file format?
Traceback (most recent call last):
File "train.py", line 144, in