Config files for F-ViT from OpenAI-CLIP
Hi, thank you for your great works. I have two questions.
Firstly, is it possible to share the following config files to train F-ViT models from OpenAI-CLIP rather than EVA-CLIP? (*The file names are just examples)
-
fvit_vitb16_upsample_fpn_bs64_3e_ovcoco_openai_original.py -
fvit_vitb16_upsample_fpn_bs64_3e_ovcoco_openai_clipself_patches.py - ...
In specific, the file fvit_vitb16_upsample_fpn_bs64_3e_ovcoco_openai_clipself_patches.py will utilze the pre-trained weight of openai_vitb16_coco_clipself_patches.pt for the initialization, instead of eva_vitb16_coco_clipself_patches.pt.
Secondly, to reproduce the results openai_vitb16_coco_clipself_patches.pt (i.e., clipself pre-trained ViT from OpenAI-CLIP instead of EVA-CLIP), which model_name did you use? When we generate text embeddings for OpenAI-CLIP, for example, are the followings identical to your settings?
python tools/dump_coco_openclip_feature.py \
... \
--model_name ViT-B-16 \ # instead of EVA02-CLIP-B-16
--pretrained openai \ # instead of eva
...
Thanks for your help.
Hi! Please refer to the scripts of my another work. I believe those would help.
Hi, thank you for the useful information.
According to your suggestion, I undertook the following steps:
- Based on the repositry you provided above, I created
models/clip_vit.py. - I also crafted
configs/ov_coco/fvit_vitb16_upsample_fpn_bs64_3e_ovcoco_openai_original.py, as delineated below:
model = dict(
type='FViT',
backbone=dict(
type='CLIPViT',
model_name='ViT-B-16',
pretrained='openai',
cache_dir='checkpoints',
norm_cfg=norm_cfg,
out_indices=[3, 5, 7, 11]),
neck=dict(
...
I hope this works as expected. Thank you.