InterpretDL grad-CAM or score-CAM visualization with a Mask R-CNN model

Hi there. I am trying to implement CAM visualizations with a Mask R-CNN model. As you know, Mask R-CNN performs classifications per ROI, but the backbone network (for ex. a FPN with Resnet50 conv blocks) extracts features over the entire input image. Could you provide some guidance as to how to use InterpretDL to generate CAM with a Mask R-CNN model? Much appreciated!

Jan 16 '21 05:01 jessecanada

Hi, @jessecanada

We have done something similar for a YOLO model of PaddleDec. Let me find the code and see if we can do the visualization directly for a Mask R-CNN model.

I'll get back to you on Monday or Tuesday ;)

Jan 16 '21 07:01 holyseven

Hi, @jessecanada

We've tried Grad-CAM on a Mask R-CNN model based on PaddleDetection. We are able to visualize/interpret a bounding box prediction and its confidence with respect to the ROI. One of the visualizations looks like this:

We are still trying to figure out how to incorporate object detection tasks into InterpretDL, but here is how we implemented it based on PaddleDetection:

Modify the get_prediction function so that it also outputs cls_prob and bbox_pred.
In mask RCNN code, outputs roi_feat in single_scale_eval, and in build, calculate and output the gradients of cls_prob or bbox_pred with respect to roi_feat.
In tools/infer.py, comment out test mode program so that gradients can be calculated without error, and then save gradients and roi_feats for visualization.
Run tools/infer.py by specifying architecture, weights and image.

If you have any further questions, please let us know!

Jan 21 '21 09:01 XuanyuWu123

Hi @XuanyuWu123

I'm searching how to visualize heatmaps on Mask R-CNN. Could you teach me how to implement Grad-CAM on a Mask R-CNN?

thanks

Jul 13 '21 22:07 nomurakeiya

Hi @nomurakeiya @XuanyuWu123 Do you figure out how to implement grad-cam on Mask R-CNN?

Jan 05 '22 09:01 Kartiky246

Hello all,

Thanks for the interests in our repo.

For the implementation of Grad-CAM on Mask R-CNN, there are several points need to be clarified:

There are three outputs of a Mask R-CNN: the bounding box coordinates, the cls prediction of this box, and the mask.
It is easy to get the explanation on the cls prediction of a certain box, that should be the heatmap computed by Grad-CAM or other algorithms. However, explanations on the bounding box coordinates or the mask, are not well defined. Please tell us if you desired other explanation results.
For the heatmap, one problem is that in eval mode, a NMS bbox_post_process is done for Mask R-CNN, which stops the computation of gradients. So that the final outputs can not be explained directly.
But we can still compute the gradients of raw outputs (where bbox_head outputs 1000 boxes). This is possible for computing Grad-CAM from here.
For PaddleDetection, we am still thinking about how to explain the model directly with our tool. At this moment, a possible way is to modify the source code of PaddleDetection, gets the feature map and gradients of a certain layer, and then compute Grad-CAM.

Let us know if there are more questions.

Cheers

Jan 12 '22 09:01 holyseven

For anyone who is still interested in obtaining the explained heatmap for Mask R-CNN models or YOLO-like models, we have given a tutorial showing the visualization results. Hope this can help.

Jul 11 '22 08:07 holyseven