msanii icon indicating copy to clipboard operation
msanii copied to clipboard

Transforms and Vocoder ralation

Open alasokolova opened this issue 1 year ago • 6 comments

Hey, I am trying to run msanii for inpainting task on my data, and facing the issue that the result are extremely noisy. I noticed that even if I don't run inpainter itself, but just run transforms and then __vocode, I don't get identical result. Especially if use_neural_vocoder is set to true (with just inverse transform it is less noisy). I suppose that there is some configuration I miss. Could you please tell me if I need to fix something to make it work?

Thanks in advance!

alasokolova avatar Feb 26 '25 16:02 alasokolova

Hello, what type of data are you working with?

Kinyugo avatar Feb 26 '25 16:02 Kinyugo

I have a short .wav audio and want to inpaint it according to the mask, I specify in config

alasokolova avatar Feb 26 '25 16:02 alasokolova

Is it a piano song?

Kinyugo avatar Feb 26 '25 16:02 Kinyugo

Should you method work only with piano audios? Mine is not

alasokolova avatar Feb 26 '25 16:02 alasokolova

Yeah. The neural vocoder was trained on pop 909 dataset rendered using fluidsynth. Thus it cannot generalize well to other data.

Kinyugo avatar Feb 26 '25 16:02 Kinyugo

Got it, thank you

alasokolova avatar Feb 26 '25 17:02 alasokolova