vision Swin_V2 addition with ImageNet22K pre-training

🚀 The feature

Thanks for the addition of Swin backbone to the repository. I think the added version is the V1 version of swin transformer. Is it possible to add also Swin-V2 with ImageNet22k pretraining?

Motivation, pitch

The motivation behind this request is that Swin-V2 with imagenet22k pre-trainings are utilized for recent SOTA performances. If this addition is completed, torchvision society will have an ability to replicate the SOTA results.

Alternatives

No response

Additional context

No response

cc @datumbox

Jun 29 '22 08:06 artest08

Thanks for the recommendation @artest08. All credit for the addition of Swin architecture goes to @xiaohu2015 who wrote the architecture and @jdsgomes who trained it. :)

I agree Swin-V2 is a worthwhile addition. Back when @xiaohu2015 implemented, the official V2 code was not public yet so we would have to guess some of the internal details. I think that now that it's out, it's worth doing. I would probably favour, if possible, extending the existing class to support it similar to what we did with EfficientNets V1 and V2. But if the changes are too many, we can take the approach we took with MobileNets V2 and V3.

We haven't finished the roadmapping for 2022H2 but adding SwinV2 is on the top of the shortlisted items. We would also love to get a community contribution for that. Our team can always support with reviews and training the models.

Jun 29 '22 10:06 datumbox

I'm currently working on it at #6246

Jul 10 '22 16:07 ain-soph