StyleTTS2 Training on custom ENGLISH dataset

I'm trying to train StyleTTS2 on my custom ENGLISH dataset from scratch Any urgent help please ? @yl4579

Jan 18 '25 21:01 shaimaalwaisi

@shaimaalwaisi whats the error?

Jan 19 '25 18:01 MARafey

train_list 001-0001.wav|ˈɔːl ɐɹˈaʊnd ɑːɹ ˈʌðə kˈɪdz hˈapɪli tʃˈatɪŋ lˈafɪŋ and mˈeɪbiːʲ ˈiːvən ʃˈeəɹɪŋ fˈuːd and ðeəɹ ˈəʊn sˈɜːkəlz ɒv fɹˈɛndʃɪp.|001 001-0002.wav|sˌʌm ɒv ˌʌs ɑː kwˈaɪətli fˈeɪsɪŋ lˈəʊnlinəs and tʃˈalɪndʒɪz ðat mˌeɪk ˌʌs fˈiːl dˈɪfɹənt.|001

val_list 002-0001.wav|dˈɪd juː nˈəʊ ðɪʲ ˈeɪti pəsˈɛnt ɒv tʃˈɪldɹən tʃˈɛk ðeə fˈəʊnz ˈɛvɹɪ fˈaɪv mˈɪnɪts kɹˈeɪzi ɹˈaɪt.|002 002-0002.wav|hˈaɪ ˈɛvɹɪwˌɒn maɪ nˈeɪm ɪz tˈanə wˈɒltən and tədˈeɪ aɪm ɡˌəʊɪŋ təbi tˈɔːkɪŋ tə juː ɐbˌaʊt sˈɛl fˈəʊn ɐdˈɪkʃən.|002

these sentences from the Train_list and val_list that i have , Im trying to follow the format filename.wav|transcription|speaker but somthing geting wrong with the speaker Id. Idont know the ideal format to follow I have two speakers now and im asking is the format that I wrote corrected or not @MARafey @yl4579

Jan 20 '25 13:01 shaimaalwaisi

The format is correct. However, the speaker IDs can be written as simply '1' and '2'. '001' is not necessary. @shaimaalwaisi

Jan 20 '25 16:01 MARafey

@MARafey @yl4579 . I had run the first training stage successfully but I got this error in the second training dataset multispeaker 14 speakers : support multiprocessing. Loading the first stage model at /scratch/c_ch_gnn22/Models/CTTdataset/first_stage.pth ... decoder loaded text_encoder loaded style_encoder loaded text_aligner loaded pitch_extractor loaded Parameter Group 0 amsgrad: False base_momentum: 0.85 betas: (0.9, 0.99) capturable: False eps: 1e-09 foreach: None initial_lr: 1e-05 lr: 1e-05 max_lr: 2e-05 max_momentum: 0.95 maximize: False min_lr: 0 weight_decay: 0.01 ) decoder AdamW ( Parameter Group 0 amsgrad: False base_momentum: 0.85 betas: (0.0, 0.99) capturable: False eps: 1e-09 foreach: None initial_lr: 1e-05 lr: 1e-05 max_lr: 2e-05 max_momentum: 0.95 maximize: False min_lr: 0 weight_decay: 0.0001 )

)

/project/c_ch_gnn22/StyleTTS2/train_second.py(460)main() 458 set_trace() 459 --> 460 optimizer.step('bert_encoder') 461 optimizer.step('bert') 462 optimizer.step('predictor')

ipdb>

Jan 30 '25 14:01 shaimaalwaisi