How to apply MiniGPTv2 to ITM (image-text-matching) tasks
I would like to use MiniGPTv2 for image classification tasks, and the idea is to calculate the ITM (Image-Text Matching) score between the image and category text. Could you please guide me on how to obtain the ITM model for the MiniGPT-v2 model, similar to BLIP and BLIP2?
BLIP-itm:https://github.com/salesforce/LAVIS/blob/main/examples/blip_image_text_matching.ipynb BLIP2-itm:https://github.com/salesforce/LAVIS/blob/main/examples/blip2_image_text_matching.ipynb
These two models can straightforwardly calculate the ITM (Image-Text Matching) score between images and category text, making them suitable for image classification tasks. They remove the LLM (Long-Long Matching) module. How can MiniGPT achieve a similar implementation?