In this article, you will get full hands-on experience with instance segmentation using PyTorch and Mask R-CNN. Image segmentation is one of the major application areas of deep learning and neural networks. One of the best known image segmentation techniques where we apply deep learning is semantic segmentation. In semantic segmentation, we mask one class in an image with a single color mask. So, different classes will have a different colored mask. You can know more about semantic segmentation from one of my previous articles. But in this article, we will focus on the topic of instance segmentation in deep learning using PyTorch and Mask R-CNN.
Take a look at the following image to get a better idea of instance segmentation.
Figure 1 shows how every person has a different color mask on the left image, although each of them belongs to the person class. Similarly, all the sheep are also masked with different colors.
In this article, we will try to apply instance segmentation and achieve similar results as the above.
So, what we will be learning in this article?
- We will not train our instance segmentation model in this tutorial. Instead, we will use the PyTorch Mask R-CNN model which has been trained on the COCO dataset.
- We will start by learning a bit more about the Mask R-CNN model. Specifically, we will get to know about the input and output format of the model.
- Then we will dive into the coding part with a very detailed explanation.
- Finally, we will test the Mask R-CNN deep learning model by applying it to images.
PyTorch Mask R-CNN Deep Learning Model
Before moving into the input and output format of the Mask R-CNN model, let’s see what it actually does and how does it do it.
The Working of Mask R-CNN Model
Let’s go over the working of Mask R-CNN and deep learning instance segmentation very briefly here.
We know that in semantic segmentation each class in an image has a single color mask. But in instance segmentation each instance of a class will have different color.
How do we achieve this then? In simple terms, we can say that we can detect each object present in an image, get its bounding boxes, classify the object inside the bounding box, and mask it with a unique color. So, instance segmentation is a combination of object detection and image segmentation. It sounds simple, but in practice and training, it can become complicated really easily. This same method is also employed by the Mask R-CNN model.
What you see in figure 2 is an example of instance segmentation. You can see that each object is being detected and then a color mask is applied on it.
In fact, Mask-RCNN is a combination of the very famous Faster-RCNN deep learning object detector and image segmentation. We will not go into any technical details of the model here. But I highly recommend that you read the original Mask R-CNN paper here. And if you want to know more about image segmentation in general, then I recommend that you read one of my previous articles on image segmentation. It covers a lot of general things like evaluation metrics, some major papers, and application areas of deep learning based image segmentation
We need not worry much about all the technical details of training such a model here. We will be using a pre-trained model that is provided by PyTorch. So, it is much more beneficial, if we can know more about the input and output format of a pre-trained model that will help us in inference and coding.
The Input and Output Format of PyTorch Mask R-CNN Model
The Mask R-CNN pre-trained model that PyTorch provides has a ResNet-50-FPN backbone.
The model expects images in batches for inference and all the pixels should be within the range [0, 1]
. So, the input format to the model will be [N, C, H, W]
. Here N
is the number of images or batch-size, C
is the color channel dimension, and H
& W
are the height and width of the image respectively. It is quite simple and in the typical PyTorch format as well.
The model outputs a lot of content though. Remember, that is a combination of object detection and image segmentation. During inference, the model outputs a list of dictionary containing the resulting tensors. Formally, it is a List[Dict[Tensor]]
. And the following are the contents, which I have taken from the PyTorch models website.
boxes
(FloatTensor[N, 4])
: the predicted boxes in[x1, y1, x2, y2]
format, with values ofx
between0
andW
and values ofy
between0
andH
.labels
(Int64Tensor[N]
): the predicted labels for each imagescores
(Tensor[N]
): the scores or each prediction.masks
(UInt8Tensor[N, 1, H, W]
): the predicted masks for each instance, in0-1
range. In order to obtain the final segmentation masks, the soft masks can be thresholded, generally with a value of 0.5 (mask >= 0.5
).
So, the dictionary contains four keys, boxes
, labels
, scores
, and masks
. These keys contain the resulting tensors as values. And notice that, we should consider the mask values which are greater than or equal to 0.5.
I hope that the above details make some of the technicalities clearer. If not, it will be much clearer when we actually code our way through. Coding and applying Mask R-CNN model to images will help us understand the working even better than it is now. So, let’s move further.
Project Directory Structure
Here, we will get to know about the project’s directory structure. I hope that you follow the same structure as in this tutorial, so that you can move on without any difficulty. The following is the directory structure that we will follow.
├───input │ image1.jpg │ image2.jpg │ image3.jpg │ ├───outputs │ └───src │ coco_names.py │ mask_rcnn_images.py │ utils.py
So, we have three folders.
- The
input
folder contains the images that we will test the Mask R-CNN model on. - The
outputs
folder will contain all the segmented images after they have passed through the Mask R-CNN model. - And finally, we have the
src
folder which will contain the Python scripts.
You are free to use any image of your choice to run inference using the Mask R-CNN model. However, if you want to use the same images as in this tutorial, then you can download the zipped input file below. The images have been taken from Pixabay.
After downloading, extract the files inside the parent project directory.
Libraries That We Need
PyTorch is the only major library that we need for this tutorial. I have used PyTorch 1.6 for this project. So, you can go ahead and download PyTorch if you have not done so.
All the other libraries are common deep learning and computer vision libraries which probably you already have. If not, feel free to install them along the way.
Instance Segmentation using PyTorch and Mask R-CNN
From this section onward, we will start to write the code for instance segmentation on images using PyTorch and Mask R-CNN.
Let’s begin with defining all the COCO dataset’s class names in a Python script.
The COCO Dataset Class Names
We will keep all the class names separate from the other Python code so that our code remains clean.
Create a coco_names.py
script inside the src
folder and put the following list into it.
COCO_INSTANCE_CATEGORY_NAMES = [ '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table', 'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush' ]
That’s all we need for this Python script. We will import this wherever we need it.
Writing Some Utilities Functions for Instance Segmentation
Now, let’s set up the utility script which will help us a lot in the tutorial. Basically, this will contain all the important functions like forward pass of the image through the model and applying the segmented mask on the image.
Things will become clearer when we will write the code. So, let’s jump directly into it.
All of this code will go into the utils.py
script inside the src
folder.
The following are the imports that we need.
import cv2 import numpy as np import random import torch from coco_names import COCO_INSTANCE_CATEGORY_NAMES as coco_names
Note that we importing the COCO_INSTANCE_CATEGORY_NAMES
from coco_names.py
.
We have a total of 91 classes for segmentation and detection. And we want that each object of each class should have a different color mask. So, all in all, we want that we each object should have a different color mask.
We need to generate a different RGB tuple for each of the detected objects in an image. The following simple line of code will do that for us.
# this will help us create a different color for each class COLORS = np.random.uniform(0, 255, size=(len(coco_names), 3))
We can use the above generated colors in OpenCV draw functions.
Function to Get the Ouputs
We will write a simple function to get the outputs from the model after inference. This function will provide us with all the output tensors that we need for proper visualization of the results. Let’s call this function get_outputs()
.
The following is the function definition.
def get_outputs(image, model, threshold): with torch.no_grad(): # forward pass of the image through the modle outputs = model(image) # get all the scores scores = list(outputs[0]['scores'].detach().cpu().numpy()) # index of those scores which are above a certain threshold thresholded_preds_inidices = [scores.index(i) for i in scores if i > threshold] thresholded_preds_count = len(thresholded_preds_inidices) # get the masks masks = (outputs[0]['masks']>0.5).squeeze().detach().cpu().numpy() # discard masks for objects which are below threshold masks = masks[:thresholded_preds_count] # get the bounding boxes, in (x1, y1), (x2, y2) format boxes = [[(int(i[0]), int(i[1])), (int(i[2]), int(i[3]))] for i in outputs[0]['boxes'].detach().cpu()] # discard bounding boxes below threshold value boxes = boxes[:thresholded_preds_count] # get the classes labels labels = [coco_names[i] for i in outputs[0]['labels']] return masks, boxes, labels
The get_outputs()
function accepts three input parameters. The first one is the input image
, the second one is the Mask R-CNN model
, and the third is the threshold
value. The threshold value is a pre-defined score below which we will discard all the outputs to avoid too many false positives. Let’s get over the code step by step.
- At line 12, we obtain the
outputs
by forward passing the image through the model. It provides us with a list containing a dictionary. - At line 15, we get all the
scores
from the dictionary and load it onto the CPU. - Then at line 17, we have
thresholded_preds_inidices
. This contains all the index value from thescores
which are above thethreshold
that we have provided. - Getting the length of the above list in
thresholded_preds_count
will help us to extract all the masks and bounding boxes with only those scores. - At line 20, we obtain the
masks
which are greater than or equal to 0.5. - Line 22 discards all the masks that are not within the threshold score. We only retain those masks which are at least above the threshold score.
- At lines 25 and 27, we obtain the bounding boxes and similarly discard all the low scoring box detections as we did in the case of masks.
- And line 30 gets all the COCO dataset label names by mapping the results
labels
indices with thecoco_names
list. - Finally, we return the
masks
,boxes
, andlabels
.
I hope that you were able to understand the above steps. Try going over those again and you will get them for sure.
Applying Segmentation and Drawing Bounding Boxes
After we have the labels, masks, and the bounding boxes, now we can apply the color masks on the object and draw the bounding boxes as well.
We will again write a very simple function for that. The function is draw_segmentation_map()
which accepts four input parameters. They are image
, masks
, boxes
, and labels
. The image
is the original image on which we will apply the resulting masks
and draw the bounding boxes around the detected objects. Also, the labels
will help us to put the class name on top of each object.
The following is the function definition.
def draw_segmentation_map(image, masks, boxes, labels): alpha = 1 beta = 0.6 # transparency for the segmentation map gamma = 0 # scalar added to each sum for i in range(len(masks)): red_map = np.zeros_like(masks[i]).astype(np.uint8) green_map = np.zeros_like(masks[i]).astype(np.uint8) blue_map = np.zeros_like(masks[i]).astype(np.uint8) # apply a randon color mask to each object color = COLORS[random.randrange(0, len(COLORS))] red_map[masks[i] == 1], green_map[masks[i] == 1], blue_map[masks[i] == 1] = color # combine all the masks into a single image segmentation_map = np.stack([red_map, green_map, blue_map], axis=2) #convert the original PIL image into NumPy format image = np.array(image) # convert from RGN to OpenCV BGR format image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR) # apply mask on the image cv2.addWeighted(image, alpha, segmentation_map, beta, gamma, image) # draw the bounding boxes around the objects cv2.rectangle(image, boxes[i][0], boxes[i][1], color=color, thickness=2) # put the label text above the objects cv2.putText(image , labels[i], (boxes[i][0][0], boxes[i][0][1]-10), cv2.FONT_HERSHEY_SIMPLEX, 1, color, thickness=2, lineType=cv2.LINE_AA) return image
- First, we have
alpha
,beta
, andgamma
. Here,alpha
andbeta
define the weights of the original image and segmentation map when we will overlay the segmentation over the image.gamma
is the scalar that is added to each sum and keeping it to 0 is ideal in almost all cases. You can find more details in the OpenCV documentation. - From line 36, we start a
for
loop for the number of masks that we have. - At lines 37, 38, and 39, we define three NumPy arrays containing only zeros whose dimensions match that of the current mask.
- Then at line 42, we obtain a random color tuple from the
COLORS
list. - Line 43 applies the above obtained
color
to the object so that a mask is created. Now, the NumPy arrays contain some color rather than only being black. - Line 45 stacks the
red_map
,green_map
, andblue_map
to obtain the complete segmentation map for the current object. - At line 47, we convert the original image from the PIL Image format to NumPy format and then convert it to OpenCV BGR color format at line 49.
- At line 51, we combine the image and the segmentation map. Basically, we overlay the segmentation map with a weight of 0.6 on the original image. So, this will give us a bit of a translucent map on the image and we can easily infer what object is actually there.
- Lines 54 and 57 draw the bounding boxes and draw the label name for the current object respectively.
- Finally, we return the resulting image at line 61.
The above two functions were the most important parts of this tutorial. If you are with me till now, then the rest of the article is pretty easy to follow along.
Applying Mask R-CNN on Images
Now, we will be writing the code to apply Mask R-CNN model on images of our choice. This part is going to be pretty easy as we have already written most of our logic in the utils.py
script.
All of this code will go into the mask_rcnn_images.py
file.
Let’s start with the imports that we need.
import torch import torchvision import cv2 import argparse from PIL import Image from utils import draw_segmentation_map, get_outputs from torchvision.transforms import transforms as transforms
We will be providing the path to the input image using command line arguments. So, let’s define our argument parser now.
parser = argparse.ArgumentParser() parser.add_argument('-i', '--input', required=True, help='path to the input data') parser.add_argument('-t', '--threshold', default=0.965, type=float, help='score threshold for discarding detection') args = vars(parser.parse_args())
We also have the optional threshold score in the above code block. By default, we will be discarding any detections that have a score lower than 0.965. If you want you may either increase or decrease the value. Although keep in mind that increasing the value too much might lead to objects not being detected. And decreasing the value too much might lead to many false positives as well.
Prepare the Model and Define the Transform
The next step is preparing our Mask R-CNN model.
# initialize the model model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True, progress=True, num_classes=91) # set the computation device device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # load the modle on to the computation device and set to eval mode model.to(device).eval()
At line 16, we are initializing the model. Note that we have provided the pretrained
argument as True
. At line 21, we load the model into the computation device and get the model into eval()
mode. Although a GPU is not very necessary as we will be working with images only, still, it is better if you have one.
The following block of code defines the transforms that we will apply to the images.
# transform to convert the image to tensor transform = transforms.Compose([ transforms.ToTensor() ])
We are just converting the images to tensors. We do not need to apply any other transform to the images before feeding them to the Mask R-CNN model.
Read the Image and Apply Instance Segmentation
We will be providing the path to the image as a command line argument. So, we will read the read image path from there. The next block of code reads the image and applies instance segmentation to it using Mask R-CNN model.
image_path = args['input'] image = Image.open(image_path).convert('RGB') # keep a copy of the original image for OpenCV functions and applying masks orig_image = image.copy() # transform the image image = transform(image) # add a batch dimension image = image.unsqueeze(0).to(device) masks, boxes, labels = get_outputs(image, model, args['threshold']) result = draw_segmentation_map(orig_image, masks, boxes, labels) # visualize the image cv2.imshow('Segmented image', result) cv2.waitKey(0) # set the save path save_path = f"../outputs/{args['input'].split('/')[-1].split('.')[0]}.jpg" cv2.imwrite(save_path, result)
- At line 26 we are capturing the image path and then reading the image using PIL at line 27. We are also keeping a copy of the original unformatted image at line 29.
- Then we apply the transformations at line 32 and add an extra batch dimension at line 34.
- Line 36 is where we call the
get_outputs()
function of theutils
script. Here we feed the transformed image to the Mask R-CNN model. And this returns us themasks
,boxes
, andlabels
. - At line 38, we call the
draw_segmentation_map()
function which overlays the segmentation masks for each object on the original image. - Then we visualize the resulting image on the screen.
- At line 45, we create a
save_path
name from the original input path and save the resulting image to disk at line 46.
This is all the code we need to apply Mask R-CNN deep learning instance segmentation model to images. We are all set to execute our code and see the results.
Execute the mask_rcnn_images.py File
Let’s see how well the Mask R-CNN model is able to detect and segment objects in images. If you are using the downloaded images, then make sure that you have unzipped the file and extracted its content into the input
folder. It is all good if you wish to use your own images as well.
We will start with the first image from the input
folder. Open up your terminal/command prompt and cd
into the src
directory of the project. Then type the following command.
python mask_rcnn_images.py --input ../input/image1.jpg
The following is the resulting segmented image.
Looks like the model is really working well. Along with all the humans in the image, it is also able to detect and segment the laptop and the potted plant. Still, the Mask R-CNN model is not able to detect the hand of the woman in the middle completely. Apart from that, all other detections and segmentations look really nice.
Now, let’s try something which does not contain any human being.
python mask_rcnn_images.py --input ../input/image2.jpg
In figure 4, we can see that the Mask R-CNN model is able to detect and segment the elephants really well. It is even able to detect and segment a partially visible elephant at the far left side.
Until now, everything is working fine. Now, let’s see a case where the Mask R-CNN model fails to some extent. Let’s try the model on the third image.
python mask_rcnn_images.py --input ../input/image3.jpg
Figure 5 shows some major flaws of the Mask R-CNN model. It fails when it has to segment a group of people close together. Interestingly, the detections are all perfect. But the model fails in segmenting the boy next to the soldier, the boy on the far right, and the leg of the soldier properly. So, it fails to segment when objects are very close to each other.
If you want, you can also try some more images and tell about your findings in the comment section.
Summary and Conclusion
In this article, you learned about instance segmentation in deep learning. You got hands-on experience by applying instance segmentation on images using the PyTorch Mask R-CNN model. I hope that you have learned something new from this tutorial.
If you have any doubts, thoughts, or suggestions, then please leave them in the comment section. I will surely address them.
You can contact me using the Contact section. You can also find me on LinkedIn, and Twitter.
Dear Sovit,
Thank you for such helpful tutorials.
Can you please upload maskrcnn PyTorch code for showing keypoints and mask output simultaneously?
I am happy that the post was helpful.
I not sure whether MaskRCNN also outputs keypoints. But yes, KeyPoint RCNN does output both bounding boxes and keypoints. So, maybe I can post one on that. I hope that is fine.
Sir I am reading through the blog now. I am not done yet, but so far what im reading is brilliant and I am learning. Thank you Sir for sharing your knowledge with the world.
Thank you Mark.
Hello Sovit,
Thank you for the wonderful blog post.
Could you please make a post on occlusion detection using PyTorch?
Hello vajrayudha. Thank you for the appreciation. Can you please tell me specifically what do you mean by occlusion detection using deep learning/PyTorch as this might be related to some domain as well?
Can you supply the source code, the versions of the modules, what version of python, and the environment (windows? linux? mac?) you are using. As well as the hardware (i.e. are you using GPU, if so, what model.)
I am on Windows 10 with PyTorch v1.8 stable, Cuda 11.1, GPU enabled with Titan RTX, Python 3.9.
I followed along and I get this error.
#——————————-
Traceback (most recent call last):
File “C:\code\src\mask_rcnn_images.py”, line 38, in
result = draw_segmentation_map(orig_image, masks, boxes, labels)
File “C:\code\src\src\utils.py”, line 52, in draw_segmentation_map
cv2.rectangle(image, boxes[i][0], boxes[i][1], color=color,
TypeError: argument for rectangle() given by name (‘color’) and position (3)
#——————————-
My modules installed in my conda virtual environment.
# Name Version Build Channel
argon2-cffi 20.1.0 py39h2bbff1b_1
async_generator 1.10 pyhd3eb1b0_0
attrs 20.3.0 pyhd3eb1b0_0
backcall 0.2.0 pyhd3eb1b0_0
blas 2.108 mkl conda-forge
blas-devel 3.9.0 8_mkl conda-forge
bleach 3.3.0 pyhd3eb1b0_0
ca-certificates 2021.1.19 haa95532_1
certifi 2020.12.5 py39haa95532_0
cffi 1.14.5 py39hcd4344a_0
colorama 0.4.4 pyhd3eb1b0_0
cudatoolkit 11.1.1 heb2d755_7 conda-forge
decorator 4.4.2 pyhd3eb1b0_0
defusedxml 0.7.1 pyhd3eb1b0_0
entrypoints 0.3 py39haa95532_0
freetype 2.10.4 h546665d_1 conda-forge
icu 58.2 ha925a31_3
importlib-metadata 2.0.0 py_1
importlib_metadata 2.0.0 1
intel-openmp 2020.3 h57928b3_311 conda-forge
ipykernel 5.3.4 py39h7b7c402_0
ipython 7.21.0 py39hd4e2768_0
ipython_genutils 0.2.0 pyhd3eb1b0_1
ipywidgets 7.6.3 pyhd3eb1b0_1
jedi 0.17.2 py39haa95532_1
jinja2 2.11.3 pyhd3eb1b0_0
jpeg 9b hb83a4c4_2
jsonschema 3.2.0 py_2
jupyter 1.0.0 py39haa95532_7
jupyter_client 6.1.7 py_0
jupyter_console 6.2.0 py_0
jupyter_core 4.7.1 py39haa95532_0
jupyterlab_pygments 0.1.2 py_0
jupyterlab_widgets 1.0.0 pyhd3eb1b0_1
libblas 3.9.0 8_mkl conda-forge
libcblas 3.9.0 8_mkl conda-forge
liblapack 3.9.0 8_mkl conda-forge
liblapacke 3.9.0 8_mkl conda-forge
libpng 1.6.37 h1d00b33_2 conda-forge
libsodium 1.0.18 h62dcd97_0
libtiff 4.1.0 h56a325e_1
libuv 1.41.0 h8ffe710_0 conda-forge
lz4-c 1.9.3 h8ffe710_0 conda-forge
m2w64-gcc-libgfortran 5.3.0 6 conda-forge
m2w64-gcc-libs 5.3.0 7 conda-forge
m2w64-gcc-libs-core 5.3.0 7 conda-forge
m2w64-gmp 6.1.0 2 conda-forge
m2w64-libwinpthread-git 5.0.0.4634.697f757 2 conda-forge
markupsafe 1.1.1 py39h2bbff1b_0
mistune 0.8.4 py39h2bbff1b_1000
mkl 2020.4 hb70f87d_311 conda-forge
mkl-devel 2020.4 h57928b3_312 conda-forge
mkl-include 2020.4 hb70f87d_311 conda-forge
msys2-conda-epoch 20160418 1 conda-forge
nbclient 0.5.3 pyhd3eb1b0_0
nbconvert 6.0.7 py39haa95532_0
nbformat 5.1.2 pyhd3eb1b0_1
nest-asyncio 1.5.1 pyhd3eb1b0_0
ninja 1.10.2 h5362a0b_0 conda-forge
notebook 6.2.0 py39haa95532_0
numpy 1.20.1 py39h6635163_0 conda-forge
olefile 0.46 pyh9f0ad1d_1 conda-forge
opencv-python 4.5.1.48 pypi_0 pypi
openssl 1.1.1j h2bbff1b_0
packaging 20.9 pyhd3eb1b0_0
pandoc 2.11 h9490d1a_0
pandocfilters 1.4.3 py39haa95532_1
parso 0.7.0 py_0
pickleshare 0.7.5 pyhd3eb1b0_1003
pillow 8.1.2 py39h4fa10fc_0
pip 21.0.1 py39haa95532_0
prometheus_client 0.9.0 pyhd3eb1b0_0
prompt-toolkit 3.0.8 py_0
prompt_toolkit 3.0.8 0
pycparser 2.20 py_2
pygments 2.8.1 pyhd3eb1b0_0
pyparsing 2.4.7 pyhd3eb1b0_0
pyqt 5.9.2 py39hd77b12b_6
pyrsistent 0.17.3 py39h2bbff1b_0
python 3.9.2 h6244533_0
python-dateutil 2.8.1 pyhd3eb1b0_0
python_abi 3.9 1_cp39 conda-forge
pytorch 1.8.0 py3.9_cuda11.1_cudnn8_0 pytorch
pywin32 228 py39he774522_0
pywinpty 0.5.7 py39haa95532_0
pyzmq 20.0.0 py39hd77b12b_1
qt 5.9.7 vc14h73c81de_0
qtconsole 5.0.2 pyhd3eb1b0_0
qtpy 1.9.0 py_0
send2trash 1.5.0 pyhd3eb1b0_1
setuptools 52.0.0 py39haa95532_0
sip 4.19.13 py39hd77b12b_0
six 1.15.0 py39haa95532_0
sqlite 3.33.0 h2a8f88b_0
svgwrite 1.4.1 pypi_0 pypi
terminado 0.9.2 py39haa95532_0
testpath 0.4.4 pyhd3eb1b0_0
tk 8.6.10 h8ffe710_1 conda-forge
torchaudio 0.8.0 py39 pytorch
torchvision 0.9.0 py39_cu111 pytorch
tornado 6.1 py39h2bbff1b_0
traitlets 5.0.5 pyhd3eb1b0_0
typing_extensions 3.7.4.3 py_0 conda-forge
tzdata 2020f h52ac0ba_0
vc 14.2 h21ff451_1
vs2015_runtime 14.27.29016 h5e58377_2
wcwidth 0.2.5 py_0
webencodings 0.5.1 py39haa95532_1
wheel 0.36.2 pyhd3eb1b0_0
widgetsnbextension 3.5.1 py39haa95532_0
wincertstore 0.2 py39h2bbff1b_0
winpty 0.4.3 4
xz 5.2.5 h62dcd97_1 conda-forge
zeromq 4.3.3 ha925a31_3
zipp 3.4.0 pyhd3eb1b0_0
zlib 1.2.11 h62dcd97_4
zstd 1.4.9 h6255e5f_0 conda-forge
How do I fix my error?
Hello Cilsya. I am really sorry you are facing this issue. Obviously, I can’t through each package that you have mentioned here. I use Windows (currently planning to move all my ML workflow to Ubuntu) and the code is based on Python 3.7.x. I have a GTX 1060. But that should not matter much. One thing I can suggest right now is to use Python 3.7 and check again. Most probably, the issue will be solved. Please reach back with updates.
I figured out that the issue is because in the cv2 rect function has as inputs the box coordinates as float tensors, i solved it by getting the item of each tensor and casting to integer, i dont know if there is a more efficient way, but it worked out.
u = boxes[i][0][0].item(), boxes[i][0][1].item()
v = boxes[i][1][0].item(), boxes[i][1][1].item()
x_min, y_min = int(u[0]), int(u[1])
x_max, y_max = int(v[0]), int(v[1])
# draw the bounding boxes around the objects
cv2.rectangle(image, (x_min, y_min), (x_max, y_max), color=color, thickness=2)
Hello Gesem. Thank you so much for pointing that out. The odd thing is I did not face that issue while executing the code myself. Although, in hindsight, I should have got that error. I will again run the code. If I face the issue, then I will update it.
Could you share the corrected version of the code, please? I am getting the same error. I did what you said but it still hasn’t improved.
It will be done within a few hours. Please check by end of the day (May 5, 2021). Sorry for the delay.
Edit: I have changed the code such that all the bounding box coordinates are integers by default. Please check again. I hope that it will run fine now.
Hi Thank you for your code.
It is so helpful for me.
I have a one question about custom data.
I downloaded picture in google and then moved this file to input folder.
But model didn’t work and gave error code.
This is code what I received File “mask_rcnn_image.py”, line 33, in
cv2.imshow(‘Segmented image’,result)
cv2.error: OpenCV(4.5.2) :-1: error: (-5:Bad argument) in function ‘imshow’
> Overload resolution failed:
> – mat is not a numpy array, neither a scalar
> – Expected Ptr for argument ‘mat’
> – Expected Ptr for argument ‘mat’
Can you tell me what I have to do?
Just check that the image from Google might have 4 color channels instead of 3. After reading the image try this
image = image[:. :, :3]
Hope this helps.
Thank you sir I resolved my problem.
I have another question.
There is no box labels but I want to segmentation boxes.
How could I solve this problem?
Hi. Generally, segmentation labels are only present for detected objects. I have never experimented with this before, what you are trying to do, will have to research a bit. But most probably, only detected objects have segmentation labels.
Thank you for your answer.
It was very helpful to implement the model that I made.
In the code i could see that the data for boxes and masks is filtered below a specific threshold. But i could not see any such operations for labels. Although, i did not experience any issue. But i would think this may lead to some wrong labels, as the sequence may not match between masks/bounding box and labels
Hi Neelam. You are right, I had not filtered the labels. Actually while drawing the bounding boxes, we are only iterating over the boxes, so whenever that list ends, we stop. But still, you are right, we should filter everything. Thanks for the feedback. Will take care of it further on.
Hi ,
I am getting error like
File “C:\Users\Hannah\anaconda3\envs\py36\lib\site-packages\torchvision\models\detection\transform.py”, line 41, in forward
images[i] = image
RuntimeError: The expanded size of the tensor (1280) must match the existing size (1333) at non-singleton dimension 2. Target sizes: [3, 720, 1280]. Tensor sizes: [3, 749, 1333]
How to solve?
Hello. Is it an error from my code? If so, can you please let me which Python file and line number are giving the error?
Hi,
Yes it is from this code , line no 60
masks, boxes, labels = get_outputs(image, model, 0.5)
Image dim showing as 4. I tried to convert to 3 by using image = image[:. :, :3] as you mentioned in above issue but it is showing syntax error
It seems that you are using an image with an alpha channel. I might be wrong though. Does the error show when using any of the images provided with this post? Can you please try and download a few new images from the internet and confirm they are RGB images with three channels before runnign the script.
Hello,
when my google colab and google drive are connected. Now if I want run this following command, what would be exactly the path for image if accessing from drive.
python mask_rcnn_images.py –input ../input/image1.jpg
I am getting error as
!pip install -e
import os
from google.colab import drive
drive.mount(‘/content/drive’)
os.chdir(“/content/drive/MyDrive/MaskRCNN_Pytorch/”)
!python /content/drive/MyDrive/MaskRCNN_Pytorch/src/mask_rcnn_images.py –input /content/drive/MyDrive/MaskRCNN_Pytorch/input/image2.jpg
Thanks in advance.
Hi Divyasha.
Can you please let me know what is the error that you are getting?
Also another thing that you can do is create the Python file in Colab directly and execute it from the Colab’s directory.
Like:
!python mask_rcnn_images.py –input path/to/your/image
That way you might be able to debug it faster. Let me know what error you are facing.
That works great thank you ! Some minor errors I wanted to report :
utils.py:line 20
masks = (outputs[0][‘masks’]>0.5).squeeze().detach().cpu().numpy()
squeeze() should be squeeze(1) : as the masks have dimensions (n, c, h, w), with c=1 (by definition because it’s a mask), and thus it’s the dimension that we want to squeeze. I have encountered a case where only one mask was detected, dimensions are then (1, 1, height, width) and squeeze() then reduce “n” and “c” dimensions.
utils.py:line 47 and 49
image = np.array(image)
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
These two lines should be just before the “for” loop :
as “image” is a variable living outside the scope of the “for” loop (in order to add one by one the detected object masks in transparency), cv2 would convert at each loop iteration the image from RGB to BGR (or BGR to RGB as the permutation is the same), resulting in doing this channel permutation twice if an even number of objects are detected. The image saved can then be in a wrong channel arrangement with weird colors.
I take the time to write this comment, precisely because this is the clearest tutorial I have seen and it helped me a lot, so I wanted to thank you properly <3
Hello David. Really appreciate the effort you put into all these little details. I will try to update the post as soon as possible.
Hi,
I am trying to execute your code with my own pretrained model with these lines :
model = torch.load(“model_folioles.pth”,map_location=torch.device(‘cpu’))
# set the computation device
device = torch.device(‘cpu’)
# load the modle on to the computation device and set to eval mode
model.to(device).eval()
but I have the following error :
Traceback (most recent call last):
File “demo.py”, line 14, in
model.to(device).eval()
AttributeError: ‘dict’ object has no attribute ‘to’
please help me !
best regards
Hello,
is exists a model for mask_rcnn_X_101_32x8d_FPN_3x.yaml models( in detectron2) in torchvision.models.detection and if yes what please ?
thank you
best regards
Hello Sylvian. I am not sure whether Detectron 2 models will work with this code. One main reason is that the code in this post loads the specific models and weights and I don’t think that happens through a YAML file which is the case for Detectron 2.
Hello,
Didn’t you try to do inference using libtorch and scripting model with
torch.jit.script?
Hello Andrew. I have not tried that yet. But will try to cover that in the future.
If you have any ideas how to manage that, I could check it and share results. Now I’m trying to make scripted Mask R-CNN model from torchvision, with no success – it doesn’t open by Netron and doesn’t load wit torch::jit::load in c++.
I will try to check that out but may need some time. Mostly, I don’t work with C++ but will try to figure that out.
How can i contact you? I need your help with Mask R-CNN
Hi. You can reach out to me on [email protected]
Dear Sovit!
Thank you for your articles and lessons.
Some disadvantages of the Mask R-CNN model have been removed in the newer version of Mask R-CNN v2.
Added the weights COCO_V1 (they are used by default).
from torchvision.models.detection import MaskRCNN_ResNet50_FPN_V2_Weights as Weights_2
model = torchvision.models.detection.maskrcnn_resnet50_fpn_v2(
weights=Weights_2.DEFAULT,
pretrained=True,
progress=True,
num_classes=91)
Now the model correctly segments the soldier and the boys.
https://drive.google.com/file/d/1F7PI68rCVtfiSR40Lf0aLOav2hXuM-7R/view?usp=sharing
Thanks for the insights Sergey.