RepDistiller Question on memory consumption for CRD loss when the dataset is very large

Hi,

Thank you for your great work, which helps me a lot.

I want to ask about the CRD contrast memory. In class ContrastMemory, there will be 2 buffers generated as 2 random tensor, for each is of the shape (number of data, number of features). Assume the number of features is 128, this buffer will become really huge when training with a large dataset, such as Glint 360k. Actually I tried to use CRD for my face recognition project and there are 17091657 pictures in this dataset, which leads to a outbread use for the GPU memory and there is no room for training.

I wonder if you can tell me if I am understanding this part right, and if I am right, is there any solution for this problem? Thanks.

Mar 03 '21 09:03 TMaysGGS

Hey TMays I am coming accross the same issue here. Have you been able to solve it?

Jul 09 '21 12:07 Xinxinatg

Hey TMays I am coming accross the same issue here. Have you been able to solve it?

Sry not yet. Since the original distillation method is useful enough, I do not add any extra loss for my training for now.

Sep 01 '21 16:09 TMaysGGS

Hi, @TMaysGGS , sry this is a late reply, and maybe you have figured it out. But if you are interested, there are two solutions:

you can use the Momentum Encoder trick in MoCo paper, and then you will only need a fixed-length queue.
if you can maintain a large batch size, then you can directly perform contrastive loss without memory buffer.

Nov 19 '21 07:11 HobbitLong