msanii Transforms and Vocoder ralation

Hey, I am trying to run msanii for inpainting task on my data, and facing the issue that the result are extremely noisy. I noticed that even if I don't run inpainter itself, but just run transforms and then __vocode, I don't get identical result. Especially if use_neural_vocoder is set to true (with just inverse transform it is less noisy). I suppose that there is some configuration I miss. Could you please tell me if I need to fix something to make it work?

Thanks in advance!

Feb 26 '25 16:02 alasokolova

Hello, what type of data are you working with?

Feb 26 '25 16:02 Kinyugo

I have a short .wav audio and want to inpaint it according to the mask, I specify in config

Feb 26 '25 16:02 alasokolova

Is it a piano song?

Feb 26 '25 16:02 Kinyugo

Should you method work only with piano audios? Mine is not

Feb 26 '25 16:02 alasokolova

Yeah. The neural vocoder was trained on pop 909 dataset rendered using fluidsynth. Thus it cannot generalize well to other data.

Feb 26 '25 16:02 Kinyugo

Got it, thank you

Feb 26 '25 17:02 alasokolova