img_norm_cfg and img_scale

Open lakshya-4gp opened this issue 4 years ago • 1 comments

In the configs/soft_teacher.py/base.py. Following statistcs have been used for img_norm

img_norm_cfg = dict(mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)

However, the coco_instance.py in the mmdet has following img_norm_cfg

mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375].

Can you please clarify the discrepancy.

Also, I had doubt as to why multiple img_scale are used img_scale=[(1333, 400), (1333, 1200)], and also how does this img_scale relate to the pretraining img_scale of the backbone. Like, I've trained a swin_transformer for 224x224 image size. I would assume that the backbone gets the image of the 224x224, then what exactly does img_scale does.

Feb 09 '22 13:02 lakshya-4gp

img_norm_cfg is used to normalize the image. And in our config, we use mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False because we use the pretrained weights from caffe. In mmdetection, however, they use the pretrained weights from torchvision by default. These two pretrained models are trained with different statistcs.
img_scale doesn't have to relate to the pretraning img_scale. Even in your case, the pretrained model are trained with 224x224, it is still better to train it with a large scale because it is hard to locate the object in a small image.

Feb 09 '22 15:02 MendelXu