FastSpeech2 Making dataset

How to create dataset？

Aug 09 '22 05:08 peanut1101

First of all, you need to collect audio data in wav/mp3 format and then use this or analog for annotation your audios. 2 step – export your annotations as csv file and .zip of wavs(unzip this). 3 step – create configs for you data and run python3 prepare_align.py config/LJSpeech/preprocess.yaml. This create raw data dir with .wavs and .lab files, which contain text from corresponding wav. 4 step – install MFA or another aligner and:

create pronunciation dictionary for your data using mfa or another tool
train mfa aligner example: mfa train raw_data/your_dataset/ lexicon/your_dataset.txt out_dir/aligner_model.zip preprocessed_data/your_dataset/TextGrid

5 step – run python3 preprocess.py config/LJSpeech/preprocess.yaml. It create preprocessed_data dir with data for training. 6 step – run model training

Aug 22 '22 09:08 ruslantau

Thanks a lot for your contribution, @ruslantau i have a question, in order to train i need als speakers.json (this is ok is just the mapping to id of each speaker in the dataset) stats.json, how to compute this? moreover i need the new list with transcription (val.txt and train.txt) how to generate those, i already have the lists for train and val but i have the text not codified in the phonemas extracted by the mfa, how can i do it?

Jan 17 '23 16:01 alessandropec

Hi @alessandropec,

Have you find a ways to make stas.json ?

Jun 12 '23 08:06 phamkhactu

@phamkhactu read this part https://github.com/ming024/FastSpeech2/blob/d4e79eb52e8b01d24703b2dfc0385544092958f3/preprocessor/preprocessor.py#L116

Jan 08 '24 13:01 djanibekov