global pooling in ViT backbone

Open FoilHao opened this issue 2 years ago • 0 comments

Thanks for your great work. In the figure, there is a global pooling layer after transformer encoder in ViT backbone. In the original implementation of dino, only cls token is used for successive processing. I wonder if you use global pooling of all tokens instead of cls token in this step.

Mar 30 '23 02:03 FoilHao