Results from the FashionIQ dataset
I used openAI's CLIP ViT-B/32 to test on FashionIQ's validation set. The results obtained and the results reported in the paper are very different, may I ask what skills exist? Did the results in the paper come from using openAI's CLIP ViT-B/32? It seems to be closer to the openclip result.
Results from openAI: dress_Recall@1 = 3.47 dress_Recall@5 = 9.87 dress_Recall@10 = 14.53 dress_Recall@50 = 33.22
Results from openclip: dress_Recall@1 = 7.44 dress_Recall@5 = 18.74 dress_Recall@10 = 25.33 dress_Recall@50 = 46.50
Hello, thanks for your interest in our work! I just went through regenerating the results on different benchmarks and realized that for Fashion-IQ they do seem to be based on the OpenCLIP series of models (your results look very similar to what I am getting with the older gpt3.5-turbo generated captions). I'm sorry for the confusion, and I hope this helps.
Ok. Thank you for your clarification