HeadHunter icon indicating copy to clipboard operation
HeadHunter copied to clipboard

Context Module introduced in PyramidBox paper

Open ignasi00 opened this issue 3 years ago • 1 comments

Just to put a context: I was asked to find a paper and reproduce some results from scratch (it weights the 50% of the subject), I've my deadline around the 10 of June of 2022.


While rewriting the detection network (in order to fully understand the paper) I found strange the CPM part and I would like to ask for advice.


Papers Text

The paper says:

with Context Sensitive feature extractor followed by series of transpose convolutions to enhance spatial resolution of feature maps.

and

we augmented on top of each individual FPNs, a Context-sensitive Prediction Module (CPM) [63]. This contextual module consists of 4 Inception-ResNet-A blocks [62] with 128 and 256 filters for 3 × 3 convolution and 1024 filters for 1 × 1 convolution.

The reference 63 says:

We design the Context-sensitive Predict Module (CPM), see Fig. 3(b), in which we replace the convolution layers of context module in SSH by the residual-free prediction module of DSSD.


Issues

From the previous cites, I understand the CPM as a SSH with different convolution operations. But your Figure 4 (from the paper) and your code shows a channel expansion which seems like the prediction module of DSSD (a kind of simplified Inception) followed by a standard SSH.

I did not find any Inception-ResNet-A blocks.

Additionally, I did not find the transpose convolutions part.


Sorry for the inconvenience, I just want to make sure I don't miss any detail and have it done correctly as soon as possible...

ignasi00 avatar May 27 '22 11:05 ignasi00

I already found the transpose convolutions part in the CustomRPN class. I still would like to understand if the sequential DSSD and SSH as CPM is a, in fact, a novel solution accredited to the PyramidBox paper or the solution implemented by the PyramidBox (and as it was intended)

ignasi00 avatar May 27 '22 14:05 ignasi00