About the memory consumption

Open YiningWang2 opened this issue 3 years ago • 1 comments

Hi, When i read the support information of AlphaFold2, I got confused about the "1.11.8 Reducing the memory consumption". It said that when using the technique called gradient checkpoint, the memory consumption can be reduced to square size from cubic size when training. And when making inference, the set of the chunk of layers can also change the memory from cubic size into square size. I don't know why this can be done? Can anyone give me a hand?

Mar 13 '22 03:03 YiningWang2

For Training, AlphaFold uses the Activation Checkpoint, which can be found in PyTorch's checkpoint interface and this paper.

For Inference, because the representation in AlphaFold has two sequential dimensions, while doing the computation in one direction, the other direction can be seen as the batch dimension, so it can be computed sequentially in chunks in this dimension, thus reducing the temporary memory needed for the intermediate computation process.

Mar 14 '22 10:03 Shenggan