Swin-Transformer icon indicating copy to clipboard operation
Swin-Transformer copied to clipboard

Swin V2 Fine-tuning a Fine-tuned Checkpoint

Open collinmccarthy opened this issue 3 years ago • 1 comments

Hello,

Let's say I want to fine-tune the swinv2_large_patch4_window12to24_192to384_22kto1k_ft pre-trained checkpoint for a new task/resolution. It was already fine-tuned on ImageNet-1k using PRETRAINED_WINDOW_SIZES: [ 12, 12, 12, 6 ] which makes sense given the feature map height/width is [48, 24, 12, 6] at the start of the four stages for an input of 192x192 and a patch size of 4.

But now that I'm fine tuning again should I update this to PRETRAINED_WINDOW_SIZES: [ 24, 24, 24, 12] (given feature map height/width [96, 48, 24, 12]) since that's what this checkpoint was fine-tuned to, or leave it where it was before?

Thanks, -Collin

collinmccarthy avatar Aug 30 '22 23:08 collinmccarthy

After reviewing the code it would make sense to leave it at whatever value was used when the "log-spaced continuous position bias method" was trained from scratch, so that would mean PRETRAINED_WINDOW_SIZES: [ 12, 12, 12, 6 ] in the scenario above.

However, I tried both settings listed above as well as PRETRAINED_WINDOW_SIZES: [ 0, 0, 0, 0] and found no difference in the model accuracy when fine-tuning on cityscapes. My assumption is this only matters for inference when the window size during inference is different from the window sized used during training. Please let me know if you've found the same thing to be true.

collinmccarthy avatar Sep 01 '22 23:09 collinmccarthy