PSPNet icon indicating copy to clipboard operation
PSPNet copied to clipboard

From which model did you fine-tune?

Open shansiliu opened this issue 8 years ago • 3 comments

Hi, you said you use resnet-101 with dilated convolution, but I can't find any pre-trained ResNet with dilated convolution on the Internet. I want to know more in details.

shansiliu avatar Jun 03 '17 07:06 shansiliu

Hi

TL;DR: just use a ResNet-101 trained on ImageNet without dilated convolutions.

In my opinion, if one consider a ResNet-101 with dilated convolution as described by Yu and Kolten [1] (without the context network), the receptive field of all neurons are the exact same between ResNet-101 and ResNet-101 with dilated convolutions. Indeed, increasing dilation factor in dilated convolutions emulates the effect of max pooling. So, if you can take a ResNet-101 trained on ImageNet, you can outputs dense predictions (on the 1000 classes) using dilated convolutions, without having to train a specific model with the dilated convolutions.

This can be very useful. For instance, in DeepLabv3 [2], Chen train a model at resolution 1/16 due to memory limitations and produces dense predictions at resolution 1/8 at testing time.

Howewer, still remains that the first layers modifications (three 3x3 conv in PSPNet instead of 7x7 conv in ResNet)

[1] Yu, Fisher, and Vladlen Koltun. "Multi-scale context aggregation by dilated convolutions." arXiv preprint arXiv:1511.07122 (2015). [2] Chen, Liang-Chieh, et al. "Rethinking Atrous Convolution for Semantic Image Segmentation." arXiv preprint arXiv:1706.05587 (2017).

howard-mahe avatar Jun 30 '17 12:06 howard-mahe

@howard-mahe May I ask why

remains that the first layers modifications (three 3x3 conv in PSPNet instead of 7x7 conv in ResNet)

What's the problem of using the 7x7?

erichhhhho avatar Aug 06 '19 16:08 erichhhhho

I don't know, only the author could answer. anyway, the main constribution of PSPNet is (1) the PSP module and (2) large crop, large batch and fine tuning BN parameters matters but requires multi gpu training.

howard-mahe avatar Aug 06 '19 16:08 howard-mahe