can anyone explain the classifier-free guidance in referencenetattention?

Open WeilunDai66 opened this issue 1 year ago • 1 comments

Why the unconditional predicted noise didn't use referencenet feature? Which may cause a gap between training and inference. When in training, we only dropout the hiddenstates, not the referencenet feature. However, in practice, we noticed that open classifier-free guidance in referencenetattention has better performance than not, the generated video has better color, can anyone explain it? Thanks.

Mar 07 '24 08:03 WeilunDai66

I'm confused too. In my case, infer imgs from custom stage1 model could be full of noise if the cfg_guidance_scale > 1. I'm trying to train with cfg.

Apr 24 '24 09:04 zhengrchan