Self-Supervised-ViT-Path
Self-Supervised-ViT-Path copied to clipboard
global pooling in ViT backbone
Thanks for your great work. In the figure, there is a global pooling layer after transformer encoder in ViT backbone. In the original implementation of dino, only cls token is used for successive processing. I wonder if you use global pooling of all tokens instead of cls token in this step.