CLIPSelf Config files for F-ViT from OpenAI-CLIP

Hi, thank you for your great works. I have two questions.

Firstly, is it possible to share the following config files to train F-ViT models from OpenAI-CLIP rather than EVA-CLIP? (*The file names are just examples)

fvit_vitb16_upsample_fpn_bs64_3e_ovcoco_openai_original.py
fvit_vitb16_upsample_fpn_bs64_3e_ovcoco_openai_clipself_patches.py
...

In specific, the file fvit_vitb16_upsample_fpn_bs64_3e_ovcoco_openai_clipself_patches.py will utilze the pre-trained weight of openai_vitb16_coco_clipself_patches.pt for the initialization, instead of eva_vitb16_coco_clipself_patches.pt.

Secondly, to reproduce the results openai_vitb16_coco_clipself_patches.pt (i.e., clipself pre-trained ViT from OpenAI-CLIP instead of EVA-CLIP), which model_name did you use? When we generate text embeddings for OpenAI-CLIP, for example, are the followings identical to your settings?

python tools/dump_coco_openclip_feature.py \
... \
--model_name ViT-B-16 \ # instead of EVA02-CLIP-B-16
--pretrained openai \ # instead of eva
...

Thanks for your help.

Mar 04 '24 10:03 yhosoya66

Hi! Please refer to the scripts of my another work. I believe those would help.

Mar 04 '24 12:03 wusize

Hi, thank you for the useful information.

According to your suggestion, I undertook the following steps:

Based on the repositry you provided above, I created models/clip_vit.py.
I also crafted configs/ov_coco/fvit_vitb16_upsample_fpn_bs64_3e_ovcoco_openai_original.py, as delineated below:

model = dict(
    type='FViT',
    backbone=dict(
        type='CLIPViT',
        model_name='ViT-B-16', 
        pretrained='openai',
        cache_dir='checkpoints', 
        norm_cfg=norm_cfg,
        out_indices=[3, 5, 7, 11]),
    neck=dict(
        ...

I hope this works as expected. Thank you.

Mar 08 '24 17:03 yhosoya66