grad_sampler features to support ConvNeXt
🚀 Feature
Adding grad_sampler functionalities for (still unsupported) layers that are present in ConvNeXt.
Motivation
The validator returns several errors when processing ConvNeXt (from the timm library) due to the presence of some layers without grad_sample functionalities.
Pitch
ConvNeXt is currently a state-of-the-art architecture on ImageNet that does not require BN and could potentially be used in DP without observing accuracy drops due to the removal of BN (as instead observed on some ResNets). Being able to use ConvNeXt might produce significant improvements in the DP accuracy achievable even on large scale experiments.
Additional context
Once the timm library is installed and the model is selected (see code below), passing the model through the validator will indicate the unsupported layers.
import timm
model = timm.create_model("convnext_tiny")
Thanks @FrancescoPinto , I'll take a look at this.
The reason for this incompatibility is due to presence of nn.Parameter() in ConvNext's block, which is not present in , say, a ResNet's block. This, therefore, requires explicit handling via a dedicated grad sampler.
The pitch is certainly reasonable; and we'd welcome PRs adding this support :)
Marking it as feature enhancement for future releases.
With the release of v1.2, it is now possible to have custom grad samples using Functorch, that might solve your problem @FrancescoPinto.
Closing the issue as it is solved by v1.2. If this solution does not work for you @FrancescoPinto, feel free to reopen