DeepLabV3Plus-Pytorch icon indicating copy to clipboard operation
DeepLabV3Plus-Pytorch copied to clipboard

Questions about evaluating cityscapes dataset

Open weiaicunzai opened this issue 5 years ago • 2 comments

Thanks for your great work, I just wandering how do you evaluate cityscapes dataset, after reading your code, it seems like you trained the model on input size 512x512, and directly evaluate on the original image size(1024 x 2048):

  if opts.crop_val:
            val_transform = et.ExtCompose([
                et.ExtResize(opts.crop_size),     # random crop to 512 x 512
                et.ExtCenterCrop(opts.crop_size),
                et.ExtToTensor(),
                et.ExtNormalize(mean=[0.485, 0.456, 0.406],
                                std=[0.229, 0.224, 0.225]),
            ])
        else:
            val_transform = et.ExtCompose([
                et.ExtToTensor(),    
                et.ExtNormalize(mean=[0.485, 0.456, 0.406],
                                std=[0.229, 0.224, 0.225]),
            ])

Why use the same model to evaluate the different input image size? Thanks.

weiaicunzai avatar Dec 02 '20 10:12 weiaicunzai

i get the same question

13717630148 avatar Jun 09 '21 02:06 13717630148

Deeplab models were trained on 512x512 patches and evaluated on full images (1024 x 2048).

The training protocol from deeplabv3 paper:

We adopt the same training protocol as before except
that we employ 90K training iterations, crop size equal to 769, 
and running inference on the whole image

There are two main reasons for training on cropped images: 1) larger batch size for more accurate BN statistics 2) more efficient training and less resource consumption. Of course, if you have sufficient GPU resources, it would be better to train models on larger images, e.g., full images or 769x769 patches.

VainF avatar Jun 09 '21 05:06 VainF