Transformers-Tutorials icon indicating copy to clipboard operation
Transformers-Tutorials copied to clipboard

DETR - Fine_tuning_DetrForObjectDetection_on_custom_dataset - Freeze Backbone (early layers)

Open bastlen opened this issue 4 years ago • 0 comments

Hello, first - nice repo, great work!

So, I´m curious how to freeze specific layers of resnet50 e.g. first layers till stage3. (Detectron2 made this possible) I was able to freeze the Backbone completely by adding this: # freeze backbone layers for n,p in self.model.named_parameters(): if "backbone" in n: p.requires_grad = False in Train the model using PyTorch Lightning section after self.lr_backbone = lr_backbone self.weight_decay = weight_decay

as a result, 23.5m parameters are frozen (resnet50 has around 23m params).

18.0 M Trainable params (DETR) 23.5 M Non-trainable params (mostly resnet50) 41.5 M Total params

I have a small dataset and I wanted to reduce the number of parameters to achieve a better generalization of the model. Maybe some1 can tell me how to freeze.

Thanks in advance

B

Here the model: DetrForObjectDetection( (model): DetrModel( (backbone): DetrConvModel( (conv_encoder): DetrTimmConvEncoder( (model): FeatureListNet( (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) (bn1): DetrFrozenBatchNorm2d() (act1): ReLU(inplace=True) (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (layer1): Sequential( **(0): Bottleneck( (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DetrFrozenBatchNorm2d() (act1): ReLU(inplace=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): DetrFrozenBatchNorm2d() (act2): ReLU(inplace=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DetrFrozenBatchNorm2d() (act3): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DetrFrozenBatchNorm2d() (act1): ReLU(inplace=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): DetrFrozenBatchNorm2d() (act2): ReLU(inplace=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DetrFrozenBatchNorm2d() (act3): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DetrFrozenBatchNorm2d() (act1): ReLU(inplace=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): DetrFrozenBatchNorm2d() (act2): ReLU(inplace=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DetrFrozenBatchNorm2d() (act3): ReLU(inplace=True) ) ) (layer2): Sequential( (0): Bottleneck( (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DetrFrozenBatchNorm2d() (act1): ReLU(inplace=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): DetrFrozenBatchNorm2d() (act2): ReLU(inplace=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DetrFrozenBatchNorm2d() (act3): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DetrFrozenBatchNorm2d() (act1): ReLU(inplace=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): DetrFrozenBatchNorm2d() (act2): ReLU(inplace=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DetrFrozenBatchNorm2d() (act3): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DetrFrozenBatchNorm2d() (act1): ReLU(inplace=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): DetrFrozenBatchNorm2d() (act2): ReLU(inplace=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DetrFrozenBatchNorm2d() (act3): ReLU(inplace=True) ) (3): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DetrFrozenBatchNorm2d() (act1): ReLU(inplace=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): DetrFrozenBatchNorm2d() (act2): ReLU(inplace=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DetrFrozenBatchNorm2d() (act3): ReLU(inplace=True) ) ) (layer3): Sequential( (0): Bottleneck( (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DetrFrozenBatchNorm2d() (act1): ReLU(inplace=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): DetrFrozenBatchNorm2d() (act2): ReLU(inplace=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DetrFrozenBatchNorm2d() (act3): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DetrFrozenBatchNorm2d() (act1): ReLU(inplace=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): DetrFrozenBatchNorm2d() (act2): ReLU(inplace=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DetrFrozenBatchNorm2d() (act3): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DetrFrozenBatchNorm2d() (act1): ReLU(inplace=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): DetrFrozenBatchNorm2d() (act2): ReLU(inplace=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DetrFrozenBatchNorm2d() (act3): ReLU(inplace=True) ) (3): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DetrFrozenBatchNorm2d() (act1): ReLU(inplace=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): DetrFrozenBatchNorm2d() (act2): ReLU(inplace=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DetrFrozenBatchNorm2d() (act3): ReLU(inplace=True) ) (4): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DetrFrozenBatchNorm2d() (act1): ReLU(inplace=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): DetrFrozenBatchNorm2d() (act2): ReLU(inplace=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DetrFrozenBatchNorm2d() (act3): ReLU(inplace=True) ) (5): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DetrFrozenBatchNorm2d() (act1): ReLU(inplace=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): DetrFrozenBatchNorm2d() (act2): ReLU(inplace=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DetrFrozenBatchNorm2d() (act3): ReLU(inplace=True) ) )** (layer4): Sequential( (0): Bottleneck( (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DetrFrozenBatchNorm2d() (act1): ReLU(inplace=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): DetrFrozenBatchNorm2d() (act2): ReLU(inplace=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DetrFrozenBatchNorm2d() (act3): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DetrFrozenBatchNorm2d() (act1): ReLU(inplace=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): DetrFrozenBatchNorm2d() (act2): ReLU(inplace=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DetrFrozenBatchNorm2d() (act3): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): DetrFrozenBatchNorm2d() (act1): ReLU(inplace=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): DetrFrozenBatchNorm2d() (act2): ReLU(inplace=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): DetrFrozenBatchNorm2d() (act3): ReLU(inplace=True) ) ) ) ) (position_embedding): DetrSinePositionEmbedding()

bastlen avatar Dec 02 '21 16:12 bastlen