PureT icon indicating copy to clipboard operation
PureT copied to clipboard

Swin Transformer pre-trained Model?

Open JingyuLi-code opened this issue 3 years ago • 5 comments

Thanks for your work! I want to know what pre-trained model do you use? ImageNet-1K or Pre-training on ImageNet-22K and fine-tuning on ImageNet-1K, have you compared that? It would be better source code will be released.

JingyuLi-code avatar Mar 15 '22 07:03 JingyuLi-code

  1. pre-trained model: The latter. We adopted the Swin-L 1k model (which have 384x384 of input size and 12 of window size) of "ImageNet-22K pre-trained models" in https://github.com/microsoft/Swin-Transformer/blob/main/get_started.md. image Actually, we have not compared the influence of different Swin backbones.

  2. code release: As soon as possible. Maybe I need to ask my supervisor.

232525 avatar Mar 16 '22 02:03 232525

Thanks! Actually, I want to know the result of using "Regular ImageNet-1K trained models" and "ImageNet-22K pre-trained models" . Because for region feature and grid feature are pre-trained on Regular ImageNet-1K. ImageNet-22K contains more images than ImageNet-1K, will it cause a huge difference?

JingyuLi-code avatar Mar 16 '22 03:03 JingyuLi-code

The difference must exist, but whether huge needs experimental verification. It seems that SwinTransformer did not release the Swin-L model pre-trained on Regular ImageNet-1K. I am running a simple experiment to train our PureT using Swin-B (input size of 384x384 and window size of 12) backbone pre-trained on Regular ImageNet-1K, the result may need a couple of days.

232525 avatar Mar 16 '22 06:03 232525

Hi, thanks for your reply! I want to know the result of using the backbone pre-trained on Regular ImageNet-1K Swin-B.

JingyuLi-code avatar Mar 22 '22 09:03 JingyuLi-code

The result is bad, even worse than using Bottom-Up region features under XE loss, so I didn't continue to train it under SCST. Abnormal! I guess maybe there were some mistakes or need some modification of the training process, but I do have not enough free time to do this now (I will try when I am free later). I have released the code, maybe you can re-train it. But I have trained the model using ImageNet-22K Swin-B, the result is normal (B1: 81.3, B2: 66.3, B3: 52.0, B4: 39.9, M: 29.9, R: 59.6, C: 136.6, S: 23.8).

232525 avatar Mar 22 '22 11:03 232525