Questions about evaluating cityscapes dataset
Thanks for your great work, I just wandering how do you evaluate cityscapes dataset, after reading your code, it seems like you trained the model on input size 512x512, and directly evaluate on the original image size(1024 x 2048):
if opts.crop_val:
val_transform = et.ExtCompose([
et.ExtResize(opts.crop_size), # random crop to 512 x 512
et.ExtCenterCrop(opts.crop_size),
et.ExtToTensor(),
et.ExtNormalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
])
else:
val_transform = et.ExtCompose([
et.ExtToTensor(),
et.ExtNormalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
])
Why use the same model to evaluate the different input image size? Thanks.
i get the same question
Deeplab models were trained on 512x512 patches and evaluated on full images (1024 x 2048).
The training protocol from deeplabv3 paper:
We adopt the same training protocol as before except
that we employ 90K training iterations, crop size equal to 769,
and running inference on the whole image
There are two main reasons for training on cropped images: 1) larger batch size for more accurate BN statistics 2) more efficient training and less resource consumption. Of course, if you have sufficient GPU resources, it would be better to train models on larger images, e.g., full images or 769x769 patches.