SimIPU kitti-pretrain loss and acc problem

Nov 10 '22 06:11 XuekuanWang

hello, we have trained the simipu-kitti model, and found the intro-loss is lower than cross loss，but acc top1 is lower。

2022-11-10 14:44:13,589 - mmdet - INFO - Epoch [122][30/58] lr: 2.964e-04, eta: 1:24:15, time: 2.173, data_time: 0.245, memory: 59185, cross_acc_ top1: 71.7124, cross_acc_top5: 93.2814, cross_loss: 6.9118, intro_loss: 2.8096, intro_acc_top1: 36.9674, intro_acc_top5: 73.1265, loss: 9.7215
2022-11-10 14:46:14,595 - mmdet - INFO - Epoch [123][30/58] lr: 2.934e-04, eta: 1:23:11, time: 2.193, data_time: 0.189, memory: 59185, cross_acc_ top1: 71.1550, cross_acc_top5: 92.9795, cross_loss: 6.9552, intro_loss: 2.8046, intro_acc_top1: 36.8098, intro_acc_top5: 72.9778, loss: 9.7598

I think the intro loss is easy to learn, so the acc maybe higher. This result is right？

Nov 10 '22 06:11 XuekuanWang

Maybe not. Remember that we need to adopt a matching algorithm to get positive pairs in the intro branch. There could be mistakes in matching. Also, the positive pairs do not exactly locate at the same spatial position in 3D space. These things also make intro-learning more difficult.

However, while there're tons of problems, the intro-branch features are in the same representative space (i.e, extracted by PointNet in this work), and the feature similarity is higher compared with image features & LiDAR features (cross-branch). Hence, it could be a small loss (more similar) but lower acc (hard to distinguish).

These are from an intuitive view. To further step, I guess one may need to research when the contrastive loss is lower.

Nov 10 '22 07:11 zhyever

Thanks.

And I try to reproduce the experimental results of the paper. But fail.

3D-AP

Easy	Mod	Hard
paper	81.32%	70.88%
paper 无 pretrain	79.17%	68.58%
100epoch(pretrain)+100epoch(下游任务)	/79.49%	68.54%
无预训练-40epoch	/	/
无预训练-100epoch	/	/

What is the reason？

1）check the kitti-pretrain model is right？The log is right?

2022-11-09 03:34:19,568 - mmdet - INFO - Epoch [100][90/116] lr: 3.697e-04, eta: 0:00:20, time: 0.999, data_time: 0.033, memory: 29223, cross_acc_top1: 75.8513, cross_acc_top5: 95.9727, cross_loss: 6.0896, intro_loss: 2.7013, intro_acc_top1: 37.7396, intro_acc_top5: 73.4069, loss: 8.7908 2022-11-09 03:34:45,791 - mmdet - INFO - Saving checkpoint at 100 epochs

I try to use the open kitti-pretrain model, but fail. There is missing many keys.

Can you help me reproduce the experimental results of the paper? Thanks.

Nov 10 '22 08:11 XuekuanWang

Please refer to the 3D detection log presented here. The model performance in the last several epochs is consistently better than the baseline w/o pertaining. No other tricks were adopted in our exps and the log is exactly corresponding to the exp reported in our paper.

For the pre-trained model, please let me know what keys are missing. I have no idea if I upload the wrong models.

Nov 10 '22 08:11 zhyever

ok, thanks. It is my log when load pretrion model ==== > SimIPU_kitti_50e.pth There has an error "unexpected key in source state_dict".

'please set runner in your config.', UserWarning) 2022-11-10 17:54:12,058 - mmdet - INFO - load checkpoint from local path: /root/paddlejob/workspace/env_run/kuan/exp/simipu/SimIPU_kitti_50e.pth 2022-11-10 17:54:12,135 - mmdet - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: backbone.conv1.weight, backbone.bn1.weight, backbone.bn1.bias, backbone.bn1.running_mean, backbone.bn1.running_var, backbone.bn1.num_batches_tracked, backbone.layer1.0.conv1.weight, backbone.layer1.0.bn1.weight, backbone.layer1.0.bn1.bias, backbone.layer1.0.bn1.running_mean, backbone.layer1.0.bn1.running_var, backbone.layer1.0.bn1.num_batches_tracked, backbone.layer1.0.conv2.weight, backbone.layer1.0.bn2.weight, backbone.layer1.0.bn2.bias, backbone.layer1.0.bn2.running_mean, backbone.layer1.0.bn2.running_var, backbone.layer1.0.bn2.num_batches_tracked, backbone.layer1.0.conv3.weight, backbone.layer1.0.bn3.weight, backbone.layer1.0.bn3.bias, backbone.layer1.0.bn3.running_mean, backbone.layer1.0.bn3.running_var, backbone.layer1.0.bn3.num_batches_tracked, backbone.layer1.0.downsample.0.weight, backbone.layer1.0.downsample.1.weight, backbone.layer1.0.downsample.1.bias, backbone.layer1.0.downsample.1.running_mean, backbone.layer1.0.downsample.1.running_var, backbone.layer1.0.downsample.1.num_batches_tracked, backbone.layer1.1.conv1.weight, backbone.layer1.1.bn1.weight, backbone.layer1.1.bn1.bias, backbone.layer1.1.bn1.running_mean, backbone.layer1.1.bn1.running_var, backbone.layer1.1.bn1.num_batches_tracked, backbone.layer1.1.conv2.weight, backbone.layer1.1.bn2.weight, backbone.layer1.1.bn2.bias, backbone.layer1.1.bn2.running_mean, backbone.layer1.1.bn2.running_var, backbone.layer1.1.bn2.num_batches_tracked, backbone.layer1.1.conv3.weight, backbone.layer1.1.bn3.weight, backbone.layer1.1.bn3.bias, backbone.layer1.1.bn3.running_mean, backbone.layer1.1.bn3.running_var, backbone.layer1.1.bn3.num_batches_tracked, backbone.layer1.2.conv1.weight, backbone.layer1.2.bn1.weight, backbone.layer1.2.bn1.bias, backbone.layer1.2.bn1.running_mean, backbone.layer1.2.bn1.running_var, backbone.layer1.2.bn1.num_batches_tracked, backbone.layer1.2.conv2.weight, backbone.layer1.2.bn2.weight, backbone.layer1.2.bn2.bias, backbone.layer1.2.bn2.running_mean, backbone.layer1.2.bn2.running_var, backbone.layer1.2.bn2.num_batches_tracked, backbone.layer1.2.conv3.weight, backbone.layer1.2.bn3.weight, backbone.layer1.2.bn3.bias, backbone.layer1.2.bn3.running_mean, backbone.layer1.2.bn3.running_var, backbone.layer1.2.bn3.num_batches_tracked, backbone.layer2.0.conv1.weight, backbone.layer2.0.bn1.weight, backbone.layer2.0.bn1.bias, backbone.layer2.0.bn1.running_mean, backbone.layer2.0.bn1.running_var, backbone.layer2.0.bn1.num_batches_tracked, backbone.layer2.0.conv2.weight, backbone.layer2.0.bn2.weight, backbone.layer2.0.bn2.bias, backbone.layer2.0.bn2.running_mean, backbone.layer2.0.bn2.running_var, backbone.layer2.0.bn2.num_batches_tracked, backbone.layer2.0.conv3.weight, backbone.layer2.0.bn3.weight, backbone.layer2.0.bn3.bias, backbone.layer2.0.bn3.running_mean, backbone.layer2.0.bn3.running_var, backbone.layer2.0.bn3.num_batches_tracked, backbone.layer2.0.downsample.0.weight, backbone.layer2.0.downsample.1.weight, backbone.layer2.0.downsample.1.bias, backbone.layer2.0.downsample.1.running_mean, backbone.layer2.0.downsample.1.running_var, backbone.layer2.0.downsample.1.num_batches_tracked, backbone.layer2.1.conv1.weight, backbone.layer2.1.bn1.weight, backbone.layer2.1.bn1.bias, backbone.layer2.1.bn1.running_mean, backbone.layer2.1.bn1.running_var, backbone.layer2.1.bn1.num_batches_tracked, backbone.layer2.1.conv2.weight, backbone.layer2.1.bn2.weight, backbone.l

Nov 10 '22 11:11 XuekuanWang

Could you please try other SimIPU pre-trained models, so that I can ensure that it is my mistake uploading wrong models.

Nov 10 '22 11:11 zhyever

Ok, I will try other pretrain model. There is the params of moca_r50_kitti. The key is not match ": backbone.conv1.weight, backbone.bn1.weight,"?

(img_backbone): ResNet( (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) (bn1): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (layer1): ResLayer( (0): Bottleneck( (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): SyncBatchNorm(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): SyncBatchNorm(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) )

Nov 10 '22 12:11 XuekuanWang

:D Hi, I would like to ask if there is any update.

Remember when pre-training, there are two encoders actually (img, point cloud). So when loading the parameters, the point cloud part could be miss-matched but the img encoder parameters could be successfully loaded.

If there is a bug, you could change the key name in the parameter dict provided in this repo. For example, you may change the backbone to img_backbone.

Nov 29 '22 08:11 zhyever