How to start downstream task of object detection on CAE ?
@YuanLiuuuuuu and team thanks for the wonderful work
I was able to start the pre-text task of Context Autoencoder for Self-Supervised Representation Learning on my custom set of images. And later start the downstream task of image classification using linear eval config shared here
Now I want to use the encoder weights to start the downstream task of object detection :
how should I go about it what changes do I need to do in the config
Currently, MMDet dose not support ViT for Mask-RCNN. But you can follow this PR to make some customized modification for CAE. Thanks!
If you have any other questions, please feel free to reopen it. Thanks!
Hi @YuanLiuuuuuu
Thanks for the reply. Can use the pretrained vit encoder as backbone and use the yolox head or any single stage detector head. If so what should be the required modification needed to achieve this .. Thanks
You can follow this PR to make some custom modifications. Thanks!
@YuanLiuuuuuu Thanks for the inputs, i am trying to follow the PR, but this is for Mask-RCNN . what needs to be changed to make vit backbone work with yolox.
Any pointer will be helpful. Thanks in Advance