grad-CAM or score-CAM visualization with a Mask R-CNN model
Hi there. I am trying to implement CAM visualizations with a Mask R-CNN model. As you know, Mask R-CNN performs classifications per ROI, but the backbone network (for ex. a FPN with Resnet50 conv blocks) extracts features over the entire input image. Could you provide some guidance as to how to use InterpretDL to generate CAM with a Mask R-CNN model? Much appreciated!
Hi, @jessecanada
We have done something similar for a YOLO model of PaddleDec. Let me find the code and see if we can do the visualization directly for a Mask R-CNN model.
I'll get back to you on Monday or Tuesday ;)
Hi, @jessecanada
We've tried Grad-CAM on a Mask R-CNN model based on PaddleDetection. We are able to visualize/interpret a bounding box prediction and its confidence with respect to the ROI. One of the visualizations looks like this:


We are still trying to figure out how to incorporate object detection tasks into InterpretDL, but here is how we implemented it based on PaddleDetection:
- Modify the get_prediction function so that it also outputs cls_prob and bbox_pred.
- In mask RCNN code, outputs roi_feat in single_scale_eval, and in build, calculate and output the gradients of cls_prob or bbox_pred with respect to roi_feat.
- In tools/infer.py, comment out test mode program so that gradients can be calculated without error, and then save gradients and roi_feats for visualization.
- Run tools/infer.py by specifying architecture, weights and image.
If you have any further questions, please let us know!
Hi @XuanyuWu123
I'm searching how to visualize heatmaps on Mask R-CNN. Could you teach me how to implement Grad-CAM on a Mask R-CNN?
thanks
Hi @nomurakeiya @XuanyuWu123 Do you figure out how to implement grad-cam on Mask R-CNN?
Hello all,
Thanks for the interests in our repo.
For the implementation of Grad-CAM on Mask R-CNN, there are several points need to be clarified:
- There are three outputs of a Mask R-CNN: the bounding box coordinates, the cls prediction of this box, and the mask.
- It is easy to get the explanation on the cls prediction of a certain box, that should be the heatmap computed by Grad-CAM or other algorithms. However, explanations on the bounding box coordinates or the mask, are not well defined. Please tell us if you desired other explanation results.
- For the heatmap, one problem is that in eval mode, a NMS bbox_post_process is done for Mask R-CNN, which stops the computation of gradients. So that the final outputs can not be explained directly.
- But we can still compute the gradients of raw outputs (where bbox_head outputs 1000 boxes). This is possible for computing Grad-CAM from here.
- For PaddleDetection, we am still thinking about how to explain the model directly with our tool. At this moment, a possible way is to modify the source code of PaddleDetection, gets the feature map and gradients of a certain layer, and then compute Grad-CAM.
Let us know if there are more questions.
Cheers
For anyone who is still interested in obtaining the explained heatmap for Mask R-CNN models or YOLO-like models, we have given a tutorial showing the visualization results. Hope this can help.