Mask/box denoising hparams
What noise hyper-params are used in MaskDINO? Thanks!
Is there any noise added to "init anchors"?
what is "Unified denoising for mask" / "Unified DN" in figure 1? any special losses for this?
Does MaskDINO also use deformable attention? Both in encoder and in decoder?
Hey, we use the same noise hyper-params as in DINO. It is now opensource, you can refer to its code for more details. Yes, we add noise to init anchors the same way as DINO. Unify means we train the model to reconstruct both the box and mask, so there is another dn mask loss. Yes, the same architecture as DINO.
I suggest you read DINO, we share similar model design and there are more implementation details there.
Hi @FengLi-ust , Thanks for your wonderful works. In the paper, Section 3.2, third paragraph, “It does not support mask refinement as the mask positional prediction from one layer cannot pass to the next layer.” Dose this mean that the masks are also refined layer-by-layer like boxes in MaskDINO? If it does, how to add masks that predicted by previous layer to current layer?