CLIP-rsicd
CLIP-rsicd copied to clipboard
Measure model "forgetful-ness"
This was prompted by the query "avocado armchair" against the project demo during the HF community evaluation. The top result returned was this one:
So obviously, the demo returned the best match it found out of the corpus of aerial images. But the image looks kind of like an avocado, which is probably due to the CLIP-RSICD model "remembering" parameters it learned as a CLIP model (i.e. before being fine tuned with RSICD).
Experiment suggested is to measure the performance of image-caption pairs from (say) MS-COCO against the baseline CLIP model and the CLIP-RSICD models and measure their performance using either the top-k metric or classification accuracy metric (described in issue-29).