mmselfsup How to start downstream task of object detection on CAE ?

@YuanLiuuuuuu and team thanks for the wonderful work

I was able to start the pre-text task of Context Autoencoder for Self-Supervised Representation Learning on my custom set of images. And later start the downstream task of image classification using linear eval config shared here

Now I want to use the encoder weights to start the downstream task of object detection :

how should I go about it what changes do I need to do in the config

Aug 23 '22 15:08 letdivedeep

Currently, MMDet dose not support ViT for Mask-RCNN. But you can follow this PR to make some customized modification for CAE. Thanks!

Aug 24 '22 01:08 YuanLiuuuuuu

If you have any other questions, please feel free to reopen it. Thanks!

Sep 08 '22 06:09 YuanLiuuuuuu

Hi @YuanLiuuuuuu

Thanks for the reply. Can use the pretrained vit encoder as backbone and use the yolox head or any single stage detector head. If so what should be the required modification needed to achieve this .. Thanks

Sep 11 '22 15:09 letdivedeep

You can follow this PR to make some custom modifications. Thanks!

Sep 14 '22 02:09 YuanLiuuuuuu

@YuanLiuuuuuu Thanks for the inputs, i am trying to follow the PR, but this is for Mask-RCNN . what needs to be changed to make vit backbone work with yolox.

Any pointer will be helpful. Thanks in Advance

Sep 15 '22 13:09 letdivedeep