FineTuning OpenAI's CLIP model
Hi @NielsRogge You are doing amazing job. We have been implemented your tutorials in various use-cases. I just wanted to know if you can provide finetuning notebook/script for OpenAI's CLIP model using a custom datasets to have following tasks and more: 1-Image captioning 2-Zero shot image classification
thanks
Hi,
Thanks for the kind words :)
CLIP cannot really be used for image captioning out-of-the-box, as it only consists of 2 encoders (a vision and a text encoder). There are various works leveraging CLIP for image captioning, like CLIP prefix captioning which trains a minimal MLP on top of a frozen CLIP.
However for state-of-the-art image captioning models, I'd recommend to check out BLIP and GIT, both of which were just added to 🤗 Transformers. Fine-tuning tutorials for those will be released. I made a 🤗 Space to compare their captioning quality: https://huggingface.co/spaces/nielsr/comparing-captioning-models.
To fine-tune CLIP on additional image-text pairs (for tasks like zero-shot image classification), I'd recommend this blog: https://huggingface.co/blog/fine-tune-clip-rsicd
Hey can you please share the code to fine tune CLIP for image retrival using flickr 8k dataset
Hi,
We do provide a script to fine-tune CLIP and similar models on an (image, text) dataset here: https://github.com/huggingface/transformers/tree/main/examples/pytorch/contrastive-image-text.
Alternatively have a look at the OpenCLIP repository which also provides script to train CLIP yourself: https://github.com/mlfoundations/open_clip.
Hi @NielsRogge , thanks for the super informative and clear post!
About this: I'd recommend to check out BLIP and GIT, both of which were just added to 🤗 Transformers. Fine-tuning tutorials for those will be released.
Do you have any news? : )
best
Hey i have used CLIP to perform Image retrieval task. Can you suggest me some ways in which we can evaluate Image retrieval procedure.
Thank you.
Hey i have used CLIP to perform Image retrieval task. Can you suggest me some ways in which we can evaluate Image retrieval procedure.
Thank you.
A similar task to image retrieval is ReID, there is a repo named CLIP-ReID, so you can get the evaluation metrics and also some more ideas if needed
I came from discor server. Im here to help :))