Moore-AnimateAnyone
Moore-AnimateAnyone copied to clipboard
can anyone explain the classifier-free guidance in referencenetattention?
Why the unconditional predicted noise didn't use referencenet feature? Which may cause a gap between training and inference. When in training, we only dropout the hiddenstates, not the referencenet feature. However, in practice, we noticed that open classifier-free guidance in referencenetattention has better performance than not, the generated video has better color, can anyone explain it? Thanks.
I'm confused too. In my case, infer imgs from custom stage1 model could be full of noise if the cfg_guidance_scale > 1. I'm trying to train with cfg.