Instance Segmentation with PyTorch and Mask R-CNN


Instance Segmentation with PyTorch and Mask R-CNN

In this article, you will get full hands-on experience with instance segmentation using PyTorch and Mask R-CNN. Image segmentation is one of the major application areas of deep learning and neural networks. One of the best known image segmentation techniques where we apply deep learning is semantic segmentation. In semantic segmentation, we mask one class in an image with a single color mask. So, different classes will have a different colored mask. You can know more about semantic segmentation from one of my previous articles. But in this article, we will focus on the topic of instance segmentation in deep learning using PyTorch and Mask R-CNN.

Take a look at the following image to get a better idea of instance segmentation.

Instance segmentation example
Figure 1. Instance segmentation in deep learning from the COCO dataset (Source).

Figure 1 shows how every person has a different color mask on the left image, although each of them belongs to the person class. Similarly, all the sheep are also masked with different colors.

In this article, we will try to apply instance segmentation and achieve similar results as the above.

So, what we will be learning in this article?

  • We will not train our instance segmentation model in this tutorial. Instead, we will use the PyTorch Mask R-CNN model which has been trained on the COCO dataset.
  • We will start by learning a bit more about the Mask R-CNN model. Specifically, we will get to know about the input and output format of the model.
  • Then we will dive into the coding part with a very detailed explanation.
  • Finally, we will test the Mask R-CNN deep learning model by applying it to images.

PyTorch Mask R-CNN Deep Learning Model

Before moving into the input and output format of the Mask R-CNN model, let’s see what it actually does and how does it do it.

The Working of Mask R-CNN Model

Let’s go over the working of Mask R-CNN and deep learning instance segmentation very briefly here.

We know that in semantic segmentation each class in an image has a single color mask. But in instance segmentation each instance of a class will have different color.

How do we achieve this then? In simple terms, we can say that we can detect each object present in an image, get its bounding boxes, classify the object inside the bounding box, and mask it with a unique color. So, instance segmentation is a combination of object detection and image segmentation. It sounds simple, but in practice and training, it can become complicated really easily. This same method is also employed by the Mask R-CNN model.

Instance segmentation in deep learning
Figure 2. Instance segmentation using the Mask R-CNN deep learning model (Source).

What you see in figure 2 is an example of instance segmentation. You can see that each object is being detected and then a color mask is applied on it.

In fact, Mask-RCNN is a combination of the very famous Faster-RCNN deep learning object detector and image segmentation. We will not go into any technical details of the model here. But I highly recommend that you read the original Mask R-CNN paper here. And if you want to know more about image segmentation in general, then I recommend that you read one of my previous articles on image segmentation. It covers a lot of general things like evaluation metrics, some major papers, and application areas of deep learning based image segmentation

We need not worry much about all the technical details of training such a model here. We will be using a pre-trained model that is provided by PyTorch. So, it is much more beneficial, if we can know more about the input and output format of a pre-trained model that will help us in inference and coding.

The Input and Output Format of PyTorch Mask R-CNN Model

The Mask R-CNN pre-trained model that PyTorch provides has a ResNet-50-FPN backbone.

The model expects images in batches for inference and all the pixels should be within the range [0, 1]. So, the input format to the model will be [N, C, H, W]. Here N is the number of images or batch-size, C is the color channel dimension, and H & W are the height and width of the image respectively. It is quite simple and in the typical PyTorch format as well.

The model outputs a lot of content though. Remember, that is a combination of object detection and image segmentation. During inference, the model outputs a list of dictionary containing the resulting tensors. Formally, it is a List[Dict[Tensor]]. And the following are the contents, which I have taken from the PyTorch models website.

  • boxes (FloatTensor[N, 4]): the predicted boxes in [x1, y1, x2, y2] format, with values of x between 0 and W and values of y between 0 and H.
  • labels (Int64Tensor[N]): the predicted labels for each image
  • scores (Tensor[N]): the scores or each prediction.
  • masks (UInt8Tensor[N, 1, H, W]): the predicted masks for each instance, in 0-1 range. In order to obtain the final segmentation masks, the soft masks can be thresholded, generally with a value of 0.5 (mask >= 0.5).

So, the dictionary contains four keys, boxes, labels, scores, and masks. These keys contain the resulting tensors as values. And notice that, we should consider the mask values which are greater than or equal to 0.5.

I hope that the above details make some of the technicalities clearer. If not, it will be much clearer when we actually code our way through. Coding and applying Mask R-CNN model to images will help us understand the working even better than it is now. So, let’s move further.

Project Directory Structure

Here, we will get to know about the project’s directory structure. I hope that you follow the same structure as in this tutorial, so that you can move on without any difficulty. The following is the directory structure that we will follow.

├───input
│       image1.jpg
│       image2.jpg
│       image3.jpg
│
├───outputs
│
└───src
    │   coco_names.py
    │   mask_rcnn_images.py
    │   utils.py

So, we have three folders.

  • The input folder contains the images that we will test the Mask R-CNN model on.
  • The outputs folder will contain all the segmented images after they have passed through the Mask R-CNN model.
  • And finally, we have the src folder which will contain the Python scripts.

You are free to use any image of your choice to run inference using the Mask R-CNN model. However, if you want to use the same images as in this tutorial, then you can download the zipped input file below. The images have been taken from Pixabay.

After downloading, extract the files inside the parent project directory.

Libraries That We Need

PyTorch is the only major library that we need for this tutorial. I have used PyTorch 1.6 for this project. So, you can go ahead and download PyTorch if you have not done so.

All the other libraries are common deep learning and computer vision libraries which probably you already have. If not, feel free to install them along the way.

Instance Segmentation using PyTorch and Mask R-CNN

From this section onward, we will start to write the code for instance segmentation on images using PyTorch and Mask R-CNN.

Let’s begin with defining all the COCO dataset’s class names in a Python script.

The COCO Dataset Class Names

We will keep all the class names separate from the other Python code so that our code remains clean.

Create a coco_names.py script inside the src folder and put the following list into it.

COCO_INSTANCE_CATEGORY_NAMES = [
    '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',
    'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
    'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',
    'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
    'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
    'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
    'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
    'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',
    'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
    'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',
    'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]

That’s all we need for this Python script. We will import this wherever we need it.

Writing Some Utilities Functions for Instance Segmentation

Now, let’s set up the utility script which will help us a lot in the tutorial. Basically, this will contain all the important functions like forward pass of the image through the model and applying the segmented mask on the image.

Things will become clearer when we will write the code. So, let’s jump directly into it.

All of this code will go into the utils.py script inside the src folder.

The following are the imports that we need.

import cv2
import numpy as np
import random
import torch

from coco_names import COCO_INSTANCE_CATEGORY_NAMES as coco_names

Note that we importing the COCO_INSTANCE_CATEGORY_NAMES from coco_names.py.

We have a total of 91 classes for segmentation and detection. And we want that each object of each class should have a different color mask. So, all in all, we want that we each object should have a different color mask.

We need to generate a different RGB tuple for each of the detected objects in an image. The following simple line of code will do that for us.

# this will help us create a different color for each class
COLORS = np.random.uniform(0, 255, size=(len(coco_names), 3))

We can use the above generated colors in OpenCV draw functions.

Function to Get the Ouputs

We will write a simple function to get the outputs from the model after inference. This function will provide us with all the output tensors that we need for proper visualization of the results. Let’s call this function get_outputs().

The following is the function definition.

def get_outputs(image, model, threshold):
    with torch.no_grad():
        # forward pass of the image through the modle
        outputs = model(image)
    
    # get all the scores
    scores = list(outputs[0]['scores'].detach().cpu().numpy())
    # index of those scores which are above a certain threshold
    thresholded_preds_inidices = [scores.index(i) for i in scores if i > threshold]
    thresholded_preds_count = len(thresholded_preds_inidices)
    # get the masks
    masks = (outputs[0]['masks']>0.5).squeeze().detach().cpu().numpy()
    # discard masks for objects which are below threshold
    masks = masks[:thresholded_preds_count]

    # get the bounding boxes, in (x1, y1), (x2, y2) format
    boxes = [[(int(i[0]), int(i[1])), (int(i[2]), int(i[3]))]  for i in outputs[0]['boxes'].detach().cpu()]
    # discard bounding boxes below threshold value
    boxes = boxes[:thresholded_preds_count]

    # get the classes labels
    labels = [coco_names[i] for i in outputs[0]['labels']]
    return masks, boxes, labels

The get_outputs() function accepts three input parameters. The first one is the input image, the second one is the Mask R-CNN model, and the third is the threshold value. The threshold value is a pre-defined score below which we will discard all the outputs to avoid too many false positives. Let’s get over the code step by step.

  • At line 12, we obtain the outputs by forward passing the image through the model. It provides us with a list containing a dictionary.
  • At line 15, we get all the scores from the dictionary and load it onto the CPU.
  • Then at line 17, we have thresholded_preds_inidices. This contains all the index value from the scores which are above the threshold that we have provided.
  • Getting the length of the above list in thresholded_preds_count will help us to extract all the masks and bounding boxes with only those scores.
  • At line 20, we obtain the masks which are greater than or equal to 0.5.
  • Line 22 discards all the masks that are not within the threshold score. We only retain those masks which are at least above the threshold score.
  • At lines 25 and 27, we obtain the bounding boxes and similarly discard all the low scoring box detections as we did in the case of masks.
  • And line 30 gets all the COCO dataset label names by mapping the results labels indices with the coco_names list.
  • Finally, we return the masks, boxes, and labels.

I hope that you were able to understand the above steps. Try going over those again and you will get them for sure.

Applying Segmentation and Drawing Bounding Boxes

After we have the labels, masks, and the bounding boxes, now we can apply the color masks on the object and draw the bounding boxes as well.

We will again write a very simple function for that. The function is draw_segmentation_map() which accepts four input parameters. They are imagemasksboxes, and labels. The image is the original image on which we will apply the resulting masks and draw the bounding boxes around the detected objects. Also, the labels will help us to put the class name on top of each object.

The following is the function definition.

def draw_segmentation_map(image, masks, boxes, labels):
    alpha = 1 
    beta = 0.6 # transparency for the segmentation map
    gamma = 0 # scalar added to each sum
    for i in range(len(masks)):
        red_map = np.zeros_like(masks[i]).astype(np.uint8)
        green_map = np.zeros_like(masks[i]).astype(np.uint8)
        blue_map = np.zeros_like(masks[i]).astype(np.uint8)

        # apply a randon color mask to each object
        color = COLORS[random.randrange(0, len(COLORS))]
        red_map[masks[i] == 1], green_map[masks[i] == 1], blue_map[masks[i] == 1]  = color
        # combine all the masks into a single image
        segmentation_map = np.stack([red_map, green_map, blue_map], axis=2)
        #convert the original PIL image into NumPy format
        image = np.array(image)
        # convert from RGN to OpenCV BGR format
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
        # apply mask on the image
        cv2.addWeighted(image, alpha, segmentation_map, beta, gamma, image)

        # draw the bounding boxes around the objects
        cv2.rectangle(image, boxes[i][0], boxes[i][1], color=color, 
                      thickness=2)
        # put the label text above the objects
        cv2.putText(image , labels[i], (boxes[i][0][0], boxes[i][0][1]-10), 
                    cv2.FONT_HERSHEY_SIMPLEX, 1, color, 
                    thickness=2, lineType=cv2.LINE_AA)
    
    return image
  • First, we have alpha, beta, and gamma. Here, alpha and beta define the weights of the original image and segmentation map when we will overlay the segmentation over the image. gamma is the scalar that is added to each sum and keeping it to 0 is ideal in almost all cases. You can find more details in the OpenCV documentation.
  • From line 36, we start a for loop for the number of masks that we have.
  • At lines 37, 38, and 39, we define three NumPy arrays containing only zeros whose dimensions match that of the current mask.
  • Then at line 42, we obtain a random color tuple from the COLORS list.
  • Line 43 applies the above obtained color to the object so that a mask is created. Now, the NumPy arrays contain some color rather than only being black.
  • Line 45 stacks the red_map, green_map, and blue_map to obtain the complete segmentation map for the current object.
  • At line 47, we convert the original image from the PIL Image format to NumPy format and then convert it to OpenCV BGR color format at line 49.
  • At line 51, we combine the image and the segmentation map. Basically, we overlay the segmentation map with a weight of 0.6 on the original image. So, this will give us a bit of a translucent map on the image and we can easily infer what object is actually there.
  • Lines 54 and 57 draw the bounding boxes and draw the label name for the current object respectively.
  • Finally, we return the resulting image at line 61.

The above two functions were the most important parts of this tutorial. If you are with me till now, then the rest of the article is pretty easy to follow along.

Applying Mask R-CNN on Images

Now, we will be writing the code to apply Mask R-CNN model on images of our choice. This part is going to be pretty easy as we have already written most of our logic in the utils.py script.

All of this code will go into the mask_rcnn_images.py file.

Let’s start with the imports that we need.

import torch
import torchvision
import cv2
import argparse

from PIL import Image
from utils import draw_segmentation_map, get_outputs
from torchvision.transforms import transforms as transforms

We will be providing the path to the input image using command line arguments. So, let’s define our argument parser now.

parser = argparse.ArgumentParser()
parser.add_argument('-i', '--input', required=True, 
                    help='path to the input data')
parser.add_argument('-t', '--threshold', default=0.965, type=float,
                    help='score threshold for discarding detection')
args = vars(parser.parse_args())

We also have the optional threshold score in the above code block. By default, we will be discarding any detections that have a score lower than 0.965. If you want you may either increase or decrease the value. Although keep in mind that increasing the value too much might lead to objects not being detected. And decreasing the value too much might lead to many false positives as well.

Prepare the Model and Define the Transform

The next step is preparing our Mask R-CNN model.

# initialize the model
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True, progress=True, 
                                                           num_classes=91)
# set the computation device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# load the modle on to the computation device and set to eval mode
model.to(device).eval()

At line 16, we are initializing the model. Note that we have provided the pretrained argument as True. At line 21, we load the model into the computation device and get the model into eval() mode. Although a GPU is not very necessary as we will be working with images only, still, it is better if you have one.

The following block of code defines the transforms that we will apply to the images.

# transform to convert the image to tensor
transform = transforms.Compose([
    transforms.ToTensor()
])

We are just converting the images to tensors. We do not need to apply any other transform to the images before feeding them to the Mask R-CNN model.

Read the Image and Apply Instance Segmentation

We will be providing the path to the image as a command line argument. So, we will read the read image path from there. The next block of code reads the image and applies instance segmentation to it using Mask R-CNN model.

image_path = args['input']
image = Image.open(image_path).convert('RGB')
# keep a copy of the original image for OpenCV functions and applying masks
orig_image = image.copy()

# transform the image
image = transform(image)
# add a batch dimension
image = image.unsqueeze(0).to(device)

masks, boxes, labels = get_outputs(image, model, args['threshold'])

result = draw_segmentation_map(orig_image, masks, boxes, labels)

# visualize the image
cv2.imshow('Segmented image', result)
cv2.waitKey(0)

# set the save path
save_path = f"../outputs/{args['input'].split('/')[-1].split('.')[0]}.jpg"
cv2.imwrite(save_path, result)
  • At line 26 we are capturing the image path and then reading the image using PIL at line 27. We are also keeping a copy of the original unformatted image at line 29.
  • Then we apply the transformations at line 32 and add an extra batch dimension at line 34.
  • Line 36 is where we call the get_outputs() function of the utils script. Here we feed the transformed image to the Mask R-CNN model. And this returns us the masks, boxes, and labels.
  • At line 38, we call the draw_segmentation_map() function which overlays the segmentation masks for each object on the original image.
  • Then we visualize the resulting image on the screen.
  • At line 45, we create a save_path name from the original input path and save the resulting image to disk at line 46.

This is all the code we need to apply Mask R-CNN deep learning instance segmentation model to images. We are all set to execute our code and see the results.

Execute the mask_rcnn_images.py File

Let’s see how well the Mask R-CNN model is able to detect and segment objects in images. If you are using the downloaded images, then make sure that you have unzipped the file and extracted its content into the input folder. It is all good if you wish to use your own images as well.

We will start with the first image from the input folder. Open up your terminal/command prompt and cd into the src directory of the project. Then type the following command.

python mask_rcnn_images.py --input ../input/image1.jpg

The following is the resulting segmented image.

Instance segmentation using PyTorch and Mask R-CNN
Figure 3. Instance segmentation using PyTorch and Mask R-CNN. We can see that the Mask R-CNN model is able to detect and segment different objects in the image pretty well.

Looks like the model is really working well. Along with all the humans in the image, it is also able to detect and segment the laptop and the potted plant. Still, the Mask R-CNN model is not able to detect the hand of the woman in the middle completely. Apart from that, all other detections and segmentations look really nice.

Now, let’s try something which does not contain any human being.

python mask_rcnn_images.py --input ../input/image2.jpg
Instance segmentation using PyTorch and Mask R-CNN.
Figure 4. Instance segmentation using PyTorch and Mask R-CNN. The Mask R-CNN model is even able to detect the elephant at the far left corner which is partially visible.

In figure 4, we can see that the Mask R-CNN model is able to detect and segment the elephants really well. It is even able to detect and segment a partially visible elephant at the far left side.

Until now, everything is working fine. Now, let’s see a case where the Mask R-CNN model fails to some extent. Let’s try the model on the third image.

python mask_rcnn_images.py --input ../input/image3.jpg
Instance segmentation using PyTorch and Mask R-CNN
Figure 5. Instance segmentation using PyTorch and Mask R-CNN. This is where the Mask R-CNN deep learning model fails to some extent. It is unable to properly segment people when they are too close together.

Figure 5 shows some major flaws of the Mask R-CNN model. It fails when it has to segment a group of people close together. Interestingly, the detections are all perfect. But the model fails in segmenting the boy next to the soldier, the boy on the far right, and the leg of the soldier properly. So, it fails to segment when objects are very close to each other.

If you want, you can also try some more images and tell about your findings in the comment section.

Summary and Conclusion

In this article, you learned about instance segmentation in deep learning. You got hands-on experience by applying instance segmentation on images using the PyTorch Mask R-CNN model. I hope that you have learned something new from this tutorial.

If you have any doubts, thoughts, or suggestions, then please leave them in the comment section. I will surely address them.

You can contact me using the Contact section. You can also find me on LinkedIn, and Twitter.

Liked it? Take a second to support Sovit Ranjan Rath on Patreon!
Become a patron at Patreon!

41 thoughts on “Instance Segmentation with PyTorch and Mask R-CNN”

  1. Sonali Ajankar says:

    Dear Sovit,
    Thank you for such helpful tutorials.
    Can you please upload maskrcnn PyTorch code for showing keypoints and mask output simultaneously?

    1. Sovit Ranjan Rath says:

      I am happy that the post was helpful.
      I not sure whether MaskRCNN also outputs keypoints. But yes, KeyPoint RCNN does output both bounding boxes and keypoints. So, maybe I can post one on that. I hope that is fine.

      1. mark says:

        Sir I am reading through the blog now. I am not done yet, but so far what im reading is brilliant and I am learning. Thank you Sir for sharing your knowledge with the world.

        1. Sovit Ranjan Rath says:

          Thank you Mark.

  2. vajrayudha says:

    Hello Sovit,
    Thank you for the wonderful blog post.
    Could you please make a post on occlusion detection using PyTorch?

    1. Sovit Ranjan Rath says:

      Hello vajrayudha. Thank you for the appreciation. Can you please tell me specifically what do you mean by occlusion detection using deep learning/PyTorch as this might be related to some domain as well?

  3. Cilsya says:

    Can you supply the source code, the versions of the modules, what version of python, and the environment (windows? linux? mac?) you are using. As well as the hardware (i.e. are you using GPU, if so, what model.)

    I am on Windows 10 with PyTorch v1.8 stable, Cuda 11.1, GPU enabled with Titan RTX, Python 3.9.

    I followed along and I get this error.

    #——————————-
    Traceback (most recent call last):
    File “C:\code\src\mask_rcnn_images.py”, line 38, in
    result = draw_segmentation_map(orig_image, masks, boxes, labels)
    File “C:\code\src\src\utils.py”, line 52, in draw_segmentation_map
    cv2.rectangle(image, boxes[i][0], boxes[i][1], color=color,
    TypeError: argument for rectangle() given by name (‘color’) and position (3)
    #——————————-

    My modules installed in my conda virtual environment.
    # Name Version Build Channel
    argon2-cffi 20.1.0 py39h2bbff1b_1
    async_generator 1.10 pyhd3eb1b0_0
    attrs 20.3.0 pyhd3eb1b0_0
    backcall 0.2.0 pyhd3eb1b0_0
    blas 2.108 mkl conda-forge
    blas-devel 3.9.0 8_mkl conda-forge
    bleach 3.3.0 pyhd3eb1b0_0
    ca-certificates 2021.1.19 haa95532_1
    certifi 2020.12.5 py39haa95532_0
    cffi 1.14.5 py39hcd4344a_0
    colorama 0.4.4 pyhd3eb1b0_0
    cudatoolkit 11.1.1 heb2d755_7 conda-forge
    decorator 4.4.2 pyhd3eb1b0_0
    defusedxml 0.7.1 pyhd3eb1b0_0
    entrypoints 0.3 py39haa95532_0
    freetype 2.10.4 h546665d_1 conda-forge
    icu 58.2 ha925a31_3
    importlib-metadata 2.0.0 py_1
    importlib_metadata 2.0.0 1
    intel-openmp 2020.3 h57928b3_311 conda-forge
    ipykernel 5.3.4 py39h7b7c402_0
    ipython 7.21.0 py39hd4e2768_0
    ipython_genutils 0.2.0 pyhd3eb1b0_1
    ipywidgets 7.6.3 pyhd3eb1b0_1
    jedi 0.17.2 py39haa95532_1
    jinja2 2.11.3 pyhd3eb1b0_0
    jpeg 9b hb83a4c4_2
    jsonschema 3.2.0 py_2
    jupyter 1.0.0 py39haa95532_7
    jupyter_client 6.1.7 py_0
    jupyter_console 6.2.0 py_0
    jupyter_core 4.7.1 py39haa95532_0
    jupyterlab_pygments 0.1.2 py_0
    jupyterlab_widgets 1.0.0 pyhd3eb1b0_1
    libblas 3.9.0 8_mkl conda-forge
    libcblas 3.9.0 8_mkl conda-forge
    liblapack 3.9.0 8_mkl conda-forge
    liblapacke 3.9.0 8_mkl conda-forge
    libpng 1.6.37 h1d00b33_2 conda-forge
    libsodium 1.0.18 h62dcd97_0
    libtiff 4.1.0 h56a325e_1
    libuv 1.41.0 h8ffe710_0 conda-forge
    lz4-c 1.9.3 h8ffe710_0 conda-forge
    m2w64-gcc-libgfortran 5.3.0 6 conda-forge
    m2w64-gcc-libs 5.3.0 7 conda-forge
    m2w64-gcc-libs-core 5.3.0 7 conda-forge
    m2w64-gmp 6.1.0 2 conda-forge
    m2w64-libwinpthread-git 5.0.0.4634.697f757 2 conda-forge
    markupsafe 1.1.1 py39h2bbff1b_0
    mistune 0.8.4 py39h2bbff1b_1000
    mkl 2020.4 hb70f87d_311 conda-forge
    mkl-devel 2020.4 h57928b3_312 conda-forge
    mkl-include 2020.4 hb70f87d_311 conda-forge
    msys2-conda-epoch 20160418 1 conda-forge
    nbclient 0.5.3 pyhd3eb1b0_0
    nbconvert 6.0.7 py39haa95532_0
    nbformat 5.1.2 pyhd3eb1b0_1
    nest-asyncio 1.5.1 pyhd3eb1b0_0
    ninja 1.10.2 h5362a0b_0 conda-forge
    notebook 6.2.0 py39haa95532_0
    numpy 1.20.1 py39h6635163_0 conda-forge
    olefile 0.46 pyh9f0ad1d_1 conda-forge
    opencv-python 4.5.1.48 pypi_0 pypi
    openssl 1.1.1j h2bbff1b_0
    packaging 20.9 pyhd3eb1b0_0
    pandoc 2.11 h9490d1a_0
    pandocfilters 1.4.3 py39haa95532_1
    parso 0.7.0 py_0
    pickleshare 0.7.5 pyhd3eb1b0_1003
    pillow 8.1.2 py39h4fa10fc_0
    pip 21.0.1 py39haa95532_0
    prometheus_client 0.9.0 pyhd3eb1b0_0
    prompt-toolkit 3.0.8 py_0
    prompt_toolkit 3.0.8 0
    pycparser 2.20 py_2
    pygments 2.8.1 pyhd3eb1b0_0
    pyparsing 2.4.7 pyhd3eb1b0_0
    pyqt 5.9.2 py39hd77b12b_6
    pyrsistent 0.17.3 py39h2bbff1b_0
    python 3.9.2 h6244533_0
    python-dateutil 2.8.1 pyhd3eb1b0_0
    python_abi 3.9 1_cp39 conda-forge
    pytorch 1.8.0 py3.9_cuda11.1_cudnn8_0 pytorch
    pywin32 228 py39he774522_0
    pywinpty 0.5.7 py39haa95532_0
    pyzmq 20.0.0 py39hd77b12b_1
    qt 5.9.7 vc14h73c81de_0
    qtconsole 5.0.2 pyhd3eb1b0_0
    qtpy 1.9.0 py_0
    send2trash 1.5.0 pyhd3eb1b0_1
    setuptools 52.0.0 py39haa95532_0
    sip 4.19.13 py39hd77b12b_0
    six 1.15.0 py39haa95532_0
    sqlite 3.33.0 h2a8f88b_0
    svgwrite 1.4.1 pypi_0 pypi
    terminado 0.9.2 py39haa95532_0
    testpath 0.4.4 pyhd3eb1b0_0
    tk 8.6.10 h8ffe710_1 conda-forge
    torchaudio 0.8.0 py39 pytorch
    torchvision 0.9.0 py39_cu111 pytorch
    tornado 6.1 py39h2bbff1b_0
    traitlets 5.0.5 pyhd3eb1b0_0
    typing_extensions 3.7.4.3 py_0 conda-forge
    tzdata 2020f h52ac0ba_0
    vc 14.2 h21ff451_1
    vs2015_runtime 14.27.29016 h5e58377_2
    wcwidth 0.2.5 py_0
    webencodings 0.5.1 py39haa95532_1
    wheel 0.36.2 pyhd3eb1b0_0
    widgetsnbextension 3.5.1 py39haa95532_0
    wincertstore 0.2 py39h2bbff1b_0
    winpty 0.4.3 4
    xz 5.2.5 h62dcd97_1 conda-forge
    zeromq 4.3.3 ha925a31_3
    zipp 3.4.0 pyhd3eb1b0_0
    zlib 1.2.11 h62dcd97_4
    zstd 1.4.9 h6255e5f_0 conda-forge

    How do I fix my error?

    1. Sovit Ranjan Rath says:

      Hello Cilsya. I am really sorry you are facing this issue. Obviously, I can’t through each package that you have mentioned here. I use Windows (currently planning to move all my ML workflow to Ubuntu) and the code is based on Python 3.7.x. I have a GTX 1060. But that should not matter much. One thing I can suggest right now is to use Python 3.7 and check again. Most probably, the issue will be solved. Please reach back with updates.

    2. Gesem Mejia says:

      I figured out that the issue is because in the cv2 rect function has as inputs the box coordinates as float tensors, i solved it by getting the item of each tensor and casting to integer, i dont know if there is a more efficient way, but it worked out.
      u = boxes[i][0][0].item(), boxes[i][0][1].item()
      v = boxes[i][1][0].item(), boxes[i][1][1].item()
      x_min, y_min = int(u[0]), int(u[1])
      x_max, y_max = int(v[0]), int(v[1])
      # draw the bounding boxes around the objects

      cv2.rectangle(image, (x_min, y_min), (x_max, y_max), color=color, thickness=2)

      1. Sovit Ranjan Rath says:

        Hello Gesem. Thank you so much for pointing that out. The odd thing is I did not face that issue while executing the code myself. Although, in hindsight, I should have got that error. I will again run the code. If I face the issue, then I will update it.

      2. Gülnur Sezen says:

        Could you share the corrected version of the code, please? I am getting the same error. I did what you said but it still hasn’t improved.

        1. Sovit Ranjan Rath says:

          It will be done within a few hours. Please check by end of the day (May 5, 2021). Sorry for the delay.
          Edit: I have changed the code such that all the bounding box coordinates are integers by default. Please check again. I hope that it will run fine now.

  4. geonho says:

    Hi Thank you for your code.
    It is so helpful for me.
    I have a one question about custom data.
    I downloaded picture in google and then moved this file to input folder.
    But model didn’t work and gave error code.
    This is code what I received File “mask_rcnn_image.py”, line 33, in
    cv2.imshow(‘Segmented image’,result)
    cv2.error: OpenCV(4.5.2) :-1: error: (-5:Bad argument) in function ‘imshow’
    > Overload resolution failed:
    > – mat is not a numpy array, neither a scalar
    > – Expected Ptr for argument ‘mat’
    > – Expected Ptr for argument ‘mat’

    Can you tell me what I have to do?

    1. Sovit Ranjan Rath says:

      Just check that the image from Google might have 4 color channels instead of 3. After reading the image try this
      image = image[:. :, :3]
      Hope this helps.

      1. geon says:

        Thank you sir I resolved my problem.
        I have another question.
        There is no box labels but I want to segmentation boxes.
        How could I solve this problem?

        1. Sovit Ranjan Rath says:

          Hi. Generally, segmentation labels are only present for detected objects. I have never experimented with this before, what you are trying to do, will have to research a bit. But most probably, only detected objects have segmentation labels.

          1. geon says:

            Thank you for your answer.
            It was very helpful to implement the model that I made.

  5. Neelam Jain says:

    In the code i could see that the data for boxes and masks is filtered below a specific threshold. But i could not see any such operations for labels. Although, i did not experience any issue. But i would think this may lead to some wrong labels, as the sequence may not match between masks/bounding box and labels

    1. Sovit Ranjan Rath says:

      Hi Neelam. You are right, I had not filtered the labels. Actually while drawing the bounding boxes, we are only iterating over the boxes, so whenever that list ends, we stop. But still, you are right, we should filter everything. Thanks for the feedback. Will take care of it further on.

  6. Geetu says:

    Hi ,

    I am getting error like

    File “C:\Users\Hannah\anaconda3\envs\py36\lib\site-packages\torchvision\models\detection\transform.py”, line 41, in forward
    images[i] = image

    RuntimeError: The expanded size of the tensor (1280) must match the existing size (1333) at non-singleton dimension 2. Target sizes: [3, 720, 1280]. Tensor sizes: [3, 749, 1333]

    How to solve?

    1. Sovit Ranjan Rath says:

      Hello. Is it an error from my code? If so, can you please let me which Python file and line number are giving the error?

      1. Geetu says:

        Hi,
        Yes it is from this code , line no 60
        masks, boxes, labels = get_outputs(image, model, 0.5)

        Image dim showing as 4. I tried to convert to 3 by using image = image[:. :, :3] as you mentioned in above issue but it is showing syntax error

        1. Sovit Ranjan Rath says:

          It seems that you are using an image with an alpha channel. I might be wrong though. Does the error show when using any of the images provided with this post? Can you please try and download a few new images from the internet and confirm they are RGB images with three channels before runnign the script.

  7. Divyasha says:

    Hello,
    when my google colab and google drive are connected. Now if I want run this following command, what would be exactly the path for image if accessing from drive.

    python mask_rcnn_images.py –input ../input/image1.jpg

    I am getting error as

    !pip install -e
    import os
    from google.colab import drive
    drive.mount(‘/content/drive’)

    os.chdir(“/content/drive/MyDrive/MaskRCNN_Pytorch/”)
    !python /content/drive/MyDrive/MaskRCNN_Pytorch/src/mask_rcnn_images.py –input /content/drive/MyDrive/MaskRCNN_Pytorch/input/image2.jpg

    Thanks in advance.

  8. Sovit Ranjan Rath says:

    Hi Divyasha.
    Can you please let me know what is the error that you are getting?
    Also another thing that you can do is create the Python file in Colab directly and execute it from the Colab’s directory.
    Like:
    !python mask_rcnn_images.py –input path/to/your/image
    That way you might be able to debug it faster. Let me know what error you are facing.

  9. David Traparic says:

    That works great thank you ! Some minor errors I wanted to report :
    utils.py:line 20
    masks = (outputs[0][‘masks’]>0.5).squeeze().detach().cpu().numpy()

    squeeze() should be squeeze(1) : as the masks have dimensions (n, c, h, w), with c=1 (by definition because it’s a mask), and thus it’s the dimension that we want to squeeze. I have encountered a case where only one mask was detected, dimensions are then (1, 1, height, width) and squeeze() then reduce “n” and “c” dimensions.

    utils.py:line 47 and 49
    image = np.array(image)
    image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

    These two lines should be just before the “for” loop :
    as “image” is a variable living outside the scope of the “for” loop (in order to add one by one the detected object masks in transparency), cv2 would convert at each loop iteration the image from RGB to BGR (or BGR to RGB as the permutation is the same), resulting in doing this channel permutation twice if an even number of objects are detected. The image saved can then be in a wrong channel arrangement with weird colors.

    I take the time to write this comment, precisely because this is the clearest tutorial I have seen and it helped me a lot, so I wanted to thank you properly <3

    1. Sovit Ranjan Rath says:

      Hello David. Really appreciate the effort you put into all these little details. I will try to update the post as soon as possible.

  10. Sylvain Ard says:

    Hi,
    I am trying to execute your code with my own pretrained model with these lines :
    model = torch.load(“model_folioles.pth”,map_location=torch.device(‘cpu’))
    # set the computation device
    device = torch.device(‘cpu’)
    # load the modle on to the computation device and set to eval mode
    model.to(device).eval()
    but I have the following error :
    Traceback (most recent call last):
    File “demo.py”, line 14, in
    model.to(device).eval()
    AttributeError: ‘dict’ object has no attribute ‘to’

    please help me !
    best regards

  11. Sylvain Ard says:

    Hello,
    is exists a model for mask_rcnn_X_101_32x8d_FPN_3x.yaml models( in detectron2) in torchvision.models.detection and if yes what please ?
    thank you
    best regards

    1. Sovit Ranjan Rath says:

      Hello Sylvian. I am not sure whether Detectron 2 models will work with this code. One main reason is that the code in this post loads the specific models and weights and I don’t think that happens through a YAML file which is the case for Detectron 2.

  12. Andrew says:

    Hello,
    Didn’t you try to do inference using libtorch and scripting model with
    torch.jit.script?

    1. Sovit Ranjan Rath says:

      Hello Andrew. I have not tried that yet. But will try to cover that in the future.

      1. Andrew says:

        If you have any ideas how to manage that, I could check it and share results. Now I’m trying to make scripted Mask R-CNN model from torchvision, with no success – it doesn’t open by Netron and doesn’t load wit torch::jit::load in c++.

        1. Sovit Ranjan Rath says:

          I will try to check that out but may need some time. Mostly, I don’t work with C++ but will try to figure that out.

    2. Embabazi says:

      How can i contact you? I need your help with Mask R-CNN

      1. Sovit Ranjan Rath says:

        Hi. You can reach out to me on [email protected]

  13. Sergey NSD says:

    Dear Sovit!
    Thank you for your articles and lessons.

    Some disadvantages of the Mask R-CNN model have been removed in the newer version of Mask R-CNN v2.
    Added the weights COCO_V1 (they are used by default).

    from torchvision.models.detection import MaskRCNN_ResNet50_FPN_V2_Weights as Weights_2

    model = torchvision.models.detection.maskrcnn_resnet50_fpn_v2(
    weights=Weights_2.DEFAULT,
    pretrained=True,
    progress=True,
    num_classes=91)

    Now the model correctly segments the soldier and the boys.
    https://drive.google.com/file/d/1F7PI68rCVtfiSR40Lf0aLOav2hXuM-7R/view?usp=sharing

    1. Sovit Ranjan Rath says:

      Thanks for the insights Sergey.

Leave a Reply

Your email address will not be published. Required fields are marked *