datacomp icon indicating copy to clipboard operation
datacomp copied to clipboard

Pretraining dataset

Open mactavish91 opened this issue 2 years ago • 1 comments

Thank you for your excellent work. I'm currently training my own CLIP model and have a question. If I use LAION-2B, COYO-700M, and Datacomp datasets simultaneously for training, will it yield better results? Should I perform data deduplication?

mactavish91 avatar Dec 28 '23 05:12 mactavish91

Hi @mactavish91, we don't have those exact experiments, but there are some relevant ones in Table 18 or our paper (https://arxiv.org/pdf/2304.14108.pdf)

gabrielilharco avatar Dec 28 '23 18:12 gabrielilharco