What are the compute resources required for this?
Can this be trained on V100 32gb or A100 40gb? The paper mentioned A100 but doesn't mentioned whether it is 40gb or 80gb.
We use A100 80g. But A100 40g or V100 can also be enough for training EVA-CLIP. Just set batch size to 1 in case of "out of memory".
Could you please provide the estimated training time for CLIPSelf on the COCO and CC3M datasets? Thanks!
The training of EVA-CLIP models is fast. With 8 A100-80Gs, it takes about 2 hours to train a ViT-B/16 on COCO for 6 epochs costs and 6 hours for ViT-L/14.
Thank you for your reply!
The training of EVA-CLIP models is fast. With 8 A100-80Gs, it takes about 2 hours to train a ViT-B/16 on COCO for 6 epochs costs and 6 hours for ViT-L/14.
We use A100 80g. But A100 40g or V100 can also be enough for training EVA-CLIP. Just set batch size to 1 in case of "out of memory".
Hello, you mentioned here that "A100 40g or V100 is also enough to train EVA-CLIP", I want to ask is it also referring to 8 pieces of A100 40g or 8 pieces of V100? Since our lab has two pieces of 3090, I would also like to ask what is the minimum configuration required for open vocabulary object detection in the paper?