Mask R-CNN

show more


Mask R-CNN Network

In principle, Mask R-CNN is an intuitive extension of Faster R-CNN, yet constructing the mask branch properly is critical for good results. Most importantly, Faster RCNN was not designed for pixel-to-pixel alignment between network inputs and outputs.


To fix the misalignment, we propose a simple, quantization-free layer, called RoIAlign, that faithfully preserves exact spatial locations.

We note that the results are not sensitive to the exact sampling locations, or how many points are sampled, as long as no quantization is performed.

Mask R-CNN Results

Mask R-CNN results on the COCO test set. These results are based on ResNet-101, achieving a mask AP of 35.7 and running at 5 fps. Masks are shown in color, and bounding box, category, and confidences are also shown.

Mask R-CNN Code

The code is implemented in Caffe2, which is using the Detectron module of Facebook.

Remember - Detectron is deprecated. You need to use Detectron2, which is re-written in Pytorch.

Mask R-CNN implementation includes the following object detection algorithms.

  • RetinaNet
  • RPN
  • R-CNN
  • Fast R-CNN
  • Faster R-CNN

Also, it uses different networks as a backbone, like

  • ResNext{50, 101, 152}
  • ResNet{50, 101, 152}
  • FPN
  • VGG16

If you need to use other architectures as the network backbone, you can easily add by using the instructions.

Mask R-CNN Benchmark

If you need more with pre-trained weights, go to the official GitHub repository of Mask R-CNN's zoo page.

Github: Repository Link
Paper: Arxiv Link
Authors: @KaimingHe, @GeorgiaGkioxari, @PiotrDollár, @RossGirshick

How helpful was this page?