linear probe
In Table 6 and Section 4.2 of your paper, you show results on linear probe (even though I recognize that is not the major contribution of this work). It says that you use an "intermediate layer of ViT-B" for the linear probe. Can you specify which layer is used? Also, is the probe just on the [CLS] token output (which would be surprising, as this token has no loss on it during pretraining), or does it do something like average over the patch token outputs?
Hi, @aaronsarna . We use the 8th layer of ViT for linear evaluation. In addition, we do not use the [CLS] token. Instead, we apply average pooling over the image patch tokens.
Thanks, that was helpful.
I've been working on reproducing your ViT results in JAX, and so far it seems that the most critical piece to getting this all to work has been the removal of the LayerNorm at the end of ViT during pretraining. I don't see that mentioned anywhere in the paper, which made it very hard to spot. Without that, the linear probe performs pretty much at chance. If you release a new version of the paper it would be good to add that point.
I'm also curious whether the linear probe you use is just a linear layer, or if you do what MAE does and add a BatchNorm before it? It does seem at least necessary to add a LayerNorm to the output of ViT layer 8 to get decent performance.
Hi, @aaronsarna , I'm also curious about the linear probing, may I have your settings of final experiment? It seems that BatchNorm before the linear layer dosen't give a good result. Thanks a lot for your reply.
As I recall, the thing that helped the most for SimMIM linear probe was to mask attention to the mask tokens in the ViT during pretraining. If you don't do that then the unmasked images fed in during linear probe training are effectively out of distribution.
@aaronsarna Thanks for your reply. You mean the SimMIM baseline can't provide the performance in the paper without removing the attention on the mask tokens manually?
I wasn't able to reproduce it without that change. Very possible I had some bug though.
Okay, thanks a lot for your help