CLIPSelf icon indicating copy to clipboard operation
CLIPSelf copied to clipboard

What are the compute resources required for this?

Open yxchng opened this issue 2 years ago • 5 comments

Can this be trained on V100 32gb or A100 40gb? The paper mentioned A100 but doesn't mentioned whether it is 40gb or 80gb.

yxchng avatar Nov 09 '23 05:11 yxchng

We use A100 80g. But A100 40g or V100 can also be enough for training EVA-CLIP. Just set batch size to 1 in case of "out of memory".

wusize avatar Nov 09 '23 05:11 wusize

Could you please provide the estimated training time for CLIPSelf on the COCO and CC3M datasets? Thanks!

cilinyan avatar Feb 06 '24 17:02 cilinyan

The training of EVA-CLIP models is fast. With 8 A100-80Gs, it takes about 2 hours to train a ViT-B/16 on COCO for 6 epochs costs and 6 hours for ViT-L/14.

wusize avatar Feb 07 '24 03:02 wusize

Thank you for your reply!

The training of EVA-CLIP models is fast. With 8 A100-80Gs, it takes about 2 hours to train a ViT-B/16 on COCO for 6 epochs costs and 6 hours for ViT-L/14.

cilinyan avatar Feb 07 '24 16:02 cilinyan

We use A100 80g. But A100 40g or V100 can also be enough for training EVA-CLIP. Just set batch size to 1 in case of "out of memory".

Hello, you mentioned here that "A100 40g or V100 is also enough to train EVA-CLIP", I want to ask is it also referring to 8 pieces of A100 40g or 8 pieces of V100? Since our lab has two pieces of 3090, I would also like to ask what is the minimum configuration required for open vocabulary object detection in the paper?

kunpeng337 avatar Dec 11 '24 08:12 kunpeng337