conditional-flow-matching Inpainting with CFM

Been having great success using cfm over diffusion methods for audio tasks so far, kudos for the great library!

One thing I'm having trouble wrapping my head around is the most correct way to formulate the inpainting task.

with denoising diffusion the repaint method is extremely intuitive and works well in practice. but i think its more complicated for flow?

Jul 11 '24 18:07 lukasschmit

Cool! I have not experimented with this. I'm curious if you've tried the same strategy for flow matching? My feeling is the same trick may work.

Jul 12 '24 14:07 atong01

@atong01 I think it should, but with the caveat that you might have to integrate the clean target through vector field (network) up to the current noisy timestep. Just using sample_xt like we do for training did not work.

I think there is another possible approach—use the mask/clean target to zero out the vector field (network output) i.e. indicating that the unmasked regions have no derivative/don't change at any timestep. and then at every single network forward pass, we force the input to be the clean target with the mask. but with this approach the network input would be noisy in some regions and clean in others which is a training/inference mismatch if the network were not trained with only some regions being corrupted.

Jul 12 '24 20:07 lukasschmit

@lukasschmit

I think the training-free inpainting method would work for both SGM and FM, the sampling process has no difference. Would you like to tell more? Repaint is no longer the best method for training-free inapinting method, you could check on this flow-based repaint method: https://arxiv.org/pdf/2310.04432

Aug 12 '24 08:08 dapaoA