Transformers-Tutorials icon indicating copy to clipboard operation
Transformers-Tutorials copied to clipboard

FineTuning OpenAI's CLIP model

Open murat-gunay opened this issue 3 years ago • 7 comments

Hi @NielsRogge You are doing amazing job. We have been implemented your tutorials in various use-cases. I just wanted to know if you can provide finetuning notebook/script for OpenAI's CLIP model using a custom datasets to have following tasks and more: 1-Image captioning 2-Zero shot image classification

thanks

murat-gunay avatar Dec 28 '22 14:12 murat-gunay

Hi,

Thanks for the kind words :)

CLIP cannot really be used for image captioning out-of-the-box, as it only consists of 2 encoders (a vision and a text encoder). There are various works leveraging CLIP for image captioning, like CLIP prefix captioning which trains a minimal MLP on top of a frozen CLIP.

However for state-of-the-art image captioning models, I'd recommend to check out BLIP and GIT, both of which were just added to 🤗 Transformers. Fine-tuning tutorials for those will be released. I made a 🤗 Space to compare their captioning quality: https://huggingface.co/spaces/nielsr/comparing-captioning-models.

To fine-tune CLIP on additional image-text pairs (for tasks like zero-shot image classification), I'd recommend this blog: https://huggingface.co/blog/fine-tune-clip-rsicd

NielsRogge avatar Jan 04 '23 10:01 NielsRogge

Hey can you please share the code to fine tune CLIP for image retrival using flickr 8k dataset

abhinavbenagi avatar May 07 '23 11:05 abhinavbenagi

Hi,

We do provide a script to fine-tune CLIP and similar models on an (image, text) dataset here: https://github.com/huggingface/transformers/tree/main/examples/pytorch/contrastive-image-text.

Alternatively have a look at the OpenCLIP repository which also provides script to train CLIP yourself: https://github.com/mlfoundations/open_clip.

NielsRogge avatar May 07 '23 15:05 NielsRogge

Hi @NielsRogge , thanks for the super informative and clear post!

About this: I'd recommend to check out BLIP and GIT, both of which were just added to 🤗 Transformers. Fine-tuning tutorials for those will be released.

Do you have any news? : )

best

alelordelo avatar May 25 '23 09:05 alelordelo

Hey i have used CLIP to perform Image retrieval task. Can you suggest me some ways in which we can evaluate Image retrieval procedure.

Thank you.

abhinavbenagi avatar Jun 28 '23 12:06 abhinavbenagi

Hey i have used CLIP to perform Image retrieval task. Can you suggest me some ways in which we can evaluate Image retrieval procedure.

Thank you.

A similar task to image retrieval is ReID, there is a repo named CLIP-ReID, so you can get the evaluation metrics and also some more ideas if needed

Mohit-robo avatar Jul 17 '23 07:07 Mohit-robo

I came from discor server. Im here to help :))

jpazv avatar Aug 15 '23 13:08 jpazv