ml-cvnets
ml-cvnets copied to clipboard
MobileViT use a larger GPU memory than AlexNet?
Is it normal? I noticed that mobilevit have no more than 10 M parameters.
This is expected because architectures are different. MobileViT (and other transformer-based models) involve operations (e.g., dot-product attention) that do not involve any learnable parameters, but requires GPU memory to save activations that could be used for forward/backward propagation.
I learnt from that during these days. Thanks for your further explanation!