ImageNet 21k based filtered dataset

Open isidentical opened this issue 1 year ago • 1 comments

Image-based filtering. We select a subset of examples whose visual content overlaps with ImageNet classes. After applying English language (fasttext) and caption length filtering, we cluster the image embeddings extracted by the OpenAI ViT-L/14 model for each image into 100K groups using Faiss [ 75]. We then find the nearest neighbor group for every ImageNet training example, and keep examples belonging to these groups. We apply this procedure using either ImageNet-21K (14M images) or ImageNet-1K (1.2M images), forming two subsets.

In the paper, regarding the composition of "Image filters", it mentions that either ImageNet-21K or ImageNet-1K can be used. Looking into the code however, especially for the Datacomp 1B, it looks like only IN1K is used. Is there a version of the Datacomp 1B with IN21K?

May 12 '24 01:05 isidentical

Hi @isidentical, thanks for the questions! In our scaling experiments we scaled both the IN1k and IN21k strategies up to the large pool (filtering 1.28B samples). Looking at Table 27 in the paper and comparing rows Image-based clustering (ImageNet1k) and Image-based clustering (ImageNet21k), we noticed average performance of 0.481 vs. 0.471. Hence, we only scaled up the IN1k strategy to the xlarge pool (filtering 12.8B samples). Unfortunately we don't have a IN21k version of DataComp 1B on hand

May 15 '24 14:05 sagadre