DeepLabV3Plus-Pytorch Questions about evaluating cityscapes dataset

Thanks for your great work, I just wandering how do you evaluate cityscapes dataset, after reading your code, it seems like you trained the model on input size 512x512, and directly evaluate on the original image size(1024 x 2048):

  if opts.crop_val:
            val_transform = et.ExtCompose([
                et.ExtResize(opts.crop_size),     # random crop to 512 x 512
                et.ExtCenterCrop(opts.crop_size),
                et.ExtToTensor(),
                et.ExtNormalize(mean=[0.485, 0.456, 0.406],
                                std=[0.229, 0.224, 0.225]),
            ])
        else:
            val_transform = et.ExtCompose([
                et.ExtToTensor(),    
                et.ExtNormalize(mean=[0.485, 0.456, 0.406],
                                std=[0.229, 0.224, 0.225]),
            ])

Why use the same model to evaluate the different input image size? Thanks.

Dec 02 '20 10:12 weiaicunzai

i get the same question

Jun 09 '21 02:06 13717630148

Deeplab models were trained on 512x512 patches and evaluated on full images (1024 x 2048).

The training protocol from deeplabv3 paper:

We adopt the same training protocol as before except
that we employ 90K training iterations, crop size equal to 769, 
and running inference on the whole image

There are two main reasons for training on cropped images: 1) larger batch size for more accurate BN statistics 2) more efficient training and less resource consumption. Of course, if you have sufficient GPU resources, it would be better to train models on larger images, e.g., full images or 769x769 patches.

Jun 09 '21 05:06 VainF