MobileViT use a larger GPU memory than AlexNet?

Open anthonyweidai opened this issue 4 years ago • 1 comments

Is it normal? I noticed that mobilevit have no more than 10 M parameters.

Dec 10 '21 07:12 anthonyweidai

This is expected because architectures are different. MobileViT (and other transformer-based models) involve operations (e.g., dot-product attention) that do not involve any learnable parameters, but requires GPU memory to save activations that could be used for forward/backward propagation.

Jun 28 '22 16:06 sacmehta

I learnt from that during these days. Thanks for your further explanation!

Sep 05 '22 03:09 anthonyweidai