youngsuenXMLY
youngsuenXMLY
Hi, I used a model at step 59000, and the VC total loss reduced to around 1.2, but all inference samples results in almost null. They looked like this: ...
I get almost the same results as yours @JRMeyer , have you solved the problem?
My test results: [test_samples.zip](https://github.com/jxzhanggg/nonparaSeq2seqVC_code/files/4278949/test_samples.zip)
In the pre-train folder, I use a decay rate 0.95 at each epoch and abandon training samples whose frame length is longer than 800. The inferred results begin to make...
Hi, in the feature extraction process, I trimmed silence using librosa.trim and I used 80 dimensional mel-spec as used in hparams.py. The text look like this:  But the mean...
In pre-train/model/layers.py, line 353-354, I change the code to self.initialize_decoder_states(memory, mask=(1-get_mask_from_lengths(memory_lengths))) Becase I found ~ is a bit-wise reverse, ~1 will get 254.
I can't get any possible difference from the source code. So would you please send me a copy of your training text and phn files. @jxzhanggg
I conducted the experiment under ubuntu16.04, using pytorch1.3.1 and python3.7. For a boolean type variable, ~True gets False and ~False gets True. I will debug Please send me a copy...
after modifying the bit-wise reverse ~, the model begin to converge to reasonable speech. One problem is the inferred result doesn't keep the speaking style from the speaker embeddings, which...
Have you tried VAE loss to further disentangle content embedding from speaker embedding?