StyleTTS2 icon indicating copy to clipboard operation
StyleTTS2 copied to clipboard

May be a bug? input parameters for model.predictor_encoder and model.style_encoder in train_finetune.py

Open starmoon-1134 opened this issue 1 year ago • 0 comments

In the train_finetune.py file, we have noticed a potential issue with the input parameters for model.predictor_encoder and model.style_encoder. The current code is as follows:

s = model.style_encoder(gt.unsqueeze(1))           
s_dur = model.predictor_encoder(gt.unsqueeze(1))

However, in the train_second.py file, we have found a different implementation that takes into account the multispeaker scenario:

s_dur = model.predictor_encoder(st.unsqueeze(1) if multispeaker else gt.unsqueeze(1))
s = model.style_encoder(st.unsqueeze(1) if multispeaker else gt.unsqueeze(1))

starmoon-1134 avatar May 29 '24 08:05 starmoon-1134