Martin Kristiansen comments

Results 6 comments of


                                            Martin Kristiansen

[Fastpitch] Multi-speaker model changes output speaker identity for different texts

@adrianastan, have you tried concatenating the speaker embeddings to the text encoding (by repeating it for each symbol)?

[Fastpitch] Multi-speaker model changes output speaker identity for different texts

@adrianastan No, the speaker embedding is summed with the input and positional encoding, not concatenated. This kind of summation should be acceptable for positional encoding, but it is not suited...

[Fastpitch] Multi-speaker model changes output speaker identity for different texts

@adrianastan In my experiment I increased the dimensionality of the encoder to fit the embedded symbols with speaker information concatenated to them. That way, the encoder receives intact and clearly...

[Fastpitch] Multi-speaker model changes output speaker identity for different texts

@adrianastan Hope it works out for you. Just wanted to add that I got great results by concatenating the speaker embedding directly to the input of the 1) pitch predictor...

[FastPitch] Other language

@Moon-sung-woo I was facing this problem too. The current implementation doesn't support multiple mean/std values for each speakers, but clearly that is needed in the multispeaker setting. Calculate these values...

[FastPitch1.1/pytorch] "estimate_pitch" in the data_function.py is a bad function

Also, to mention another problem with that function, the fmax setting is way too high `librosa.note_to_hz('C7')` equals to 2093 Hz, and nobody speaks at that frequency. Probabilistic YIN takes MUCH...