BLIP icon indicating copy to clipboard operation
BLIP copied to clipboard

Question about COCO, SBU, CC3M datasets

Open 4fee8fea opened this issue 3 years ago • 1 comments

Hi, @LiJunnan1992

Thanks for your great work and make it public.

I have followed the link in DATA.md to downlaod MS-COCO 2014 train images, 2014 val images, and karpathy split.

The number of images and captions I can access is 123,287 and 646,767, respectively, which is different from that reported in the paper, as 113K and 567K, respectively.

May I ask what's the reason for the difference?

Besides, If one can access the whole SBU and CC3M datasets, do you think it is a good idea to use all image-text pairs within them?

Thanks in advance

4fee8fea avatar Sep 04 '22 04:09 4fee8fea

Hi, thanks for your questions.

The Karpathy split is different from the 2014 split.

Yes it is good to use the entire SBU and CC3M images if you have access.

LiJunnan1992 avatar Sep 21 '22 23:09 LiJunnan1992