How much does ImageNet pre-training affect model performance?

Open fbragman opened this issue 3 years ago • 0 comments

Hi,

I am trying to use the baseline model (Linear decoder) described in the paper as a baseline for some of my work. However, I do not have access to pre-trained ImageNet weights. My model is not able to learn, converging at around 0.25 mDICE on the training set of Cityscapes. This is after hyper parameter optimisation across SGD, Adam + different learning rate schedulers.

I was wondering if during your experiments, you saw similar levels of performance when you did not initialise your transformer backbones with pre-trained weights? Was this tested for the baseline (ViT + Linear) and your proposed method (ViT + Mask)?

Thank you

Aug 24 '22 16:08 fbragman