CLIP
CLIP copied to clipboard
Implementation of MSCOCO retrieval metric
Can the author confirm how the recall is implemented for both text to image and image to text given there are 5 captions per image?
Please check this: https://github.com/openai/CLIP/issues/115