BYOL-PyTorch
BYOL-PyTorch copied to clipboard
The code without syncbn will collapse
I notice a paper "Momentum2 Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning". It is an interesting work.

The results using all BN will not collapse.
I doubt the results may come from all view L2Norm? Should we split two views for testing?
Try to increase the weight decay such as 5e-4 including the bn and bias. I have also tried to include the shufflingBN from MoCo which helps a lot. The paper you have mentioned adopted the weight decay of 1e-4 without lars.