SoftTeacher icon indicating copy to clipboard operation
SoftTeacher copied to clipboard

img_norm_cfg and img_scale

Open lakshya-4gp opened this issue 4 years ago • 1 comments

In the configs/soft_teacher.py/base.py. Following statistcs have been used for img_norm

img_norm_cfg = dict(mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)

However, the coco_instance.py in the mmdet has following img_norm_cfg

mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375].

Can you please clarify the discrepancy.

Also, I had doubt as to why multiple img_scale are used img_scale=[(1333, 400), (1333, 1200)], and also how does this img_scale relate to the pretraining img_scale of the backbone. Like, I've trained a swin_transformer for 224x224 image size. I would assume that the backbone gets the image of the 224x224, then what exactly does img_scale does.

lakshya-4gp avatar Feb 09 '22 13:02 lakshya-4gp

  • img_norm_cfg is used to normalize the image. And in our config, we use mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False because we use the pretrained weights from caffe. In mmdetection, however, they use the pretrained weights from torchvision by default. These two pretrained models are trained with different statistcs.
  • img_scale doesn't have to relate to the pretraning img_scale. Even in your case, the pretrained model are trained with 224x224, it is still better to train it with a large scale because it is hard to locate the object in a small image.

MendelXu avatar Feb 09 '22 15:02 MendelXu