A few weeks ago I posted a tutorial on Faster RCNN Object Detection with PyTorch. In this article, the readers got to use deep learning and Faster RCNN object detector to detect objects in videos and images. After going through the tutorial, one of the readers asked me if I could do a tutorial detecting potholes in images of roads. He wanted to compare the performance of the RCNN deep learning object detector and the YOLO deep learning object detector. After that, I got down to making the tutorial happen. Although we will not be able to compare two different deep learning object detectors here. We will be carrying out road pothole detection with PyTorch Faster RCNN ResNet50.

Figure 1 shows an example output after we train a Faster RCNN model and use it to predict on the test data. You can also expect to get similar results after going through this tutorial.
What will you learn in this tutorial?
- How to use PyTorch for object detection on a real-world dataset?
- Using PyTorch pre-trained models and fine-tuning it by training it on our own dataset.
- Using the Faster RCNN ResNet50 FPN model for training and detecting potholes in images of roads.
- Finally, detecting potholes in the test images using the trained models.
I hope that you are excited to move along with this tutorial. Let’s start.
The Dataset and the Project Directory
This is perhaps the most important thing in deep learning and machine learning in general. We need a dataset to start anything in deep learning.
We will use the dataset that is provided in this paper. Thanks to S. Nienaber, M.J. Booysen, and R.S. Kroon for making this dataset public. This dataset contains almost 8 GB of image data. You can find the original data in this Google Drive link. But I recommend that you do not download the data from this link. The main reason being, you will have to do a lot of preprocessing yourself and also create the labeled dataframe yourself.
I have downloaded the whole Dataset 1 (Simplex) data and written a python script to generate a CSV file with all the pothole instances and corresponding labels. Then I have uploaded it to Kaggle Dataset and made it public. It will be much easier for you if you download the dataset from this Kaggle Dataset link.
This dataset contains the whole Dataset 1 (Simplex) and a train_df.csv
file which contains all the annotated instances of all the potholes in the images.
Download the zip file and extract it while using following the directory structure. Extract it inside the input
The Directory Structure
The code for pothole detection using Faster RCNN is structured in the following manner.
│ config.py │ dataset.py │ engine.py │ model.py │ test.py │ train.py ├───checkpoints ├───input │ │ PotholeDataset.pdf │ │ train_df.csv │ └───Dataset 1 (Simplex) │ └───Dataset 1 (Simplex) │ ├───Test data │ └───Train data │ ├───Negative data │ └───Positive data ├───test_predictions
- You can see that there are six python scripts. We will get into the details of their content when writing the code for them.
- The
folder contains the dataset after we extract the data that we download from the Kaggle Dataset link.The Dataset 1 (Simplex)
also contains two text files that have the annotations of the pothole images. You need not worry about that now. I have already created thetrain_df.csv
using the training annotation text file. TheTrain data
contains the positive and negative pothole images. TheTest data
folder contains images that we will use for testing. - The
folder will contain the trained model. - Finally,
will contain all the output after we use our trained Faster RCNN object detector to detect potholes in the images insideTest data
Here are a few images from the dataset with potholes in them.

Figure 2 shows a few images from the dataset that contain potholes in them. I recommend that you explore the dataset on your own a bit before moving further.
A Note Before Moving Ahead
I hope that you have set up the directory for the project as per the above section. Now, you will notice that the train_df.csv
contains the instances with positive examples only. This means that we are considering only those images for training that contain potholes. We will not be training the model on images that do not have any potholes.
The truth is, while training deep learning object detectors, it almost never hurts to train on positive instances only. This means that we can only show those images to the deep learning object detector that has some labeled instances of the object we want to detect. If an image does not contain an object and we do not train on those images, then performance does not decrease in most cases. This is true for all the benchmarking datasets like Pascal VOC or MS COCO as well. All the images in these datasets contain some instances of the objects. Therefore, we are ignoring the negative classes while training on this pothole image dataset.
Now, we are all set to jump into the coding part of this tutorial.
Install the Required Libraries and Frameworks
- We will use the PyTorch deep learning framework in this tutorial. Be sure to install the latest version of PyTorch before moving ahead.
- For the transformations, we will use the Albumentations library. Install this before moving ahead as well.
pip install albumentations
- If you want to know more about the usage of Albumentations, then you may check one of my previous articles here. Image Augmentation using PyTorch and Albumentations.
Pothole Detection using Faster RCNN ResNet50 and PyTorch
Starting from this section onward, we will write the code to detect potholes in roads using deep learning object detection. We will move step-by-step while writing the code for each of the python scripts. I will be telling which python code will go into which script to avoid confusion.
Let’s start with the configuration python script, which is config.py
Setting Up the Configuration Python Script
All the code in this section will go into the config.py
file. The config.py
python script will contain all the training configurations. These include the training and test data path, the number of epochs to train for, the batch size, and some other details as well.
Let’s write the code and then we will get into the details.
ROOT_PATH = 'input/Dataset 1 (Simplex)/Dataset 1 (Simplex)' TEST_PATH = 'input/Dataset 1 (Simplex)/Dataset 1 (Simplex)/Test data' PREDICTION_THRES = 0.8 EPOCHS = 5 MIN_SIZE = 800 BATCH_SIZE = 2 DEBUG = False # to visualize the images before training
We also have the MIN_SIZE
argument set to 800. This is the size that the Faster RCNN ResNet50 model will resize the input image to. This is a very important argument too. We can get really good results by setting this to a higher resolution like 1024. But that will also increase the training time. For now, we are keeping it to the default size of 800. If you want to know more about the effect of the MIN_SIZE
value, then you should surely take a look at this tutorial.
is the confidence threshold that we will use while testing on the test data. Any detections below the confidence value of 0.8 will be rejected.
Then we have the DEBUG
argument. If this argument is True
, then the code will show us a few annotated input images before training. Also, we will be training the model for 5 epochs.
Preparing the Faster RCNN ResNet50 FPN Model for Pothole Detection
Now, we will write the code to load the Faster RCNN ResNet50 FPN model. We will use the pre-trained weights that PyTorch provides. We will just change the head of the model so that the model will classify the images based on our input images. The code here will go into the model.py
python file.
The following are the imports that we will need to prepare the deep learning object detector model.
""" Python script to prepare FasterRCNN model. """ import torch import torchvision from torchvision.models.detection.faster_rcnn import FastRCNNPredictor from torchvision.models.detection import FasterRCNN from torchvision.models.detection.rpn import AnchorGenerator import config
Next, let’s write the function to prepare the model.
def model(): # load the COCO pre-trained model # we will keep the image size to the original 800 for faster training, # you can increase the `min_size` in `config.py` for better ressults, # although it may increase the training time (a trade-off) model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True, min_size=config.MIN_SIZE) # one class is for pot holes, and the other is background num_classes = 2 # get the input features for the classifier in_features = model.roi_heads.box_predictor.cls_score.in_features # replace pre-trained head with our features head # the head layer will classify the images based on our data input features model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes) return model
The above code block contains a lot of documentation to help you understand the code block better.
- Take a look at lines 6 and 7. We are using
and theMIN_SIZE
argument from theconfig.py
script. - Now, you may wonder, why do we have two classes when there is only one class in our dataset, that is “pothole”. Well, one class is “pothole” and the other is the mandatory background class.
- On line 11, we are getting the
. These are the pre-trained features from the MS COCO dataset. - Finally, On line 14, we initialize the
with thein_features
and the number of classes (num_classes
Preparing the Dataset for Pothole Detection using Faster RCNN
Here, we will prepare the dataset for training. This includes the PotHoleDataset()
class and the data loader as well. We will also define the training image transformations here. Again, be sure to install the Albumentations library before moving ahead.
This code will go into the dataset.py
Import the following modules and libraries.
""" Python script to prepare the dataset """ import os import numpy as np import cv2 import torch import glob import albumentations as A import pandas as pd import config from torch.utils.data import Dataset from albumentations.pytorch.transforms import ToTensorV2 from torch.utils.data import DataLoader
You will notice that we are importing ToTensorV2
from albumentations
. This is the Albumentations implementation to convert pixels into tensors.
The Dataset Class
The following is the complete PotHoleDataset()
class PotHoleDataset(Dataset): def __init__(self, dataframe, image_dir, transforms=None): super().__init__() self.image_ids = dataframe['image_id'].unique() self.df = dataframe self.image_dir = image_dir self.transforms = transforms def __getitem__(self, index: int): image_id = self.image_ids[index] records = self.df[self.df['image_id'] == image_id] image = cv2.imread(f"{self.image_dir}/Train Data/Positive data/{image_id}.JPG", cv2.IMREAD_COLOR) image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB).astype(np.float32) image /= 255.0 # convert the boxes into x_min, y_min, x_max, y_max format boxes = records[['x', 'y', 'w', 'h']].values boxes[:, 2] = boxes[:, 0] + boxes[:, 2] boxes[:, 3] = boxes[:, 1] + boxes[:, 3] # get the area of the bounding boxes area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0]) area = torch.as_tensor(area, dtype=torch.float32) # we have only one class labels = torch.ones((records.shape[0],), dtype=torch.int64) # supposing that all instances are not crowd iscrowd = torch.zeros((records.shape[0],), dtype=torch.int64) target = {} target['boxes'] = boxes target['labels'] = labels target['image_id'] = torch.tensor([index]) target['area'] = area target['iscrowd'] = iscrowd # apply the image transforms if self.transforms: sample = { 'image': image, 'bboxes': target['boxes'], 'labels': labels } sample = self.transforms(**sample) image = sample['image'] # convert the bounding boxes to PyTorch `FloatTensor` target['boxes'] = torch.stack(tuple(map(torch.FloatTensor, zip(*sample['bboxes'])))).permute(1, 0) return image, target, image_id def __len__(self): return self.image_ids.shape[0]
We will not go into much details of the above code class. We will take a look just at some of the important lines of code. If you have any doubts, then feel free to ask in the comment section. I will surely answer them.
- We have the bounding boxes in the x_min, y_min, width, and height format. We are converting that to x_min, y_min, x_max, and y_max format from lines 20 to 22.
- At lines 25 and 26, we get the area of the bounding boxes.
- We prepare the
dictionary starting from line 34 till line 39. This will act as our training labels. - Then we apply the image transforms starting from line 42 and convert the bounding boxes to
at line 52. - Finally, we return the
, andimage_id
at line 55.
If you wish to get into more depth of preparing the data for PyTorch object detection in general, then be sure to take a look at this official GitHub repository. You will find the dataset preparation code inside the coco_utils.py
Now, we will define two functions.
The collate_fn() Function
The collate_fn()
function helps when we have different number of instances in the images. This will lead to a different number of targets in a single batch as well which will cause problems during training. The collate_fn()
function takes a single batch of data and returns it as a tuple.
def collate_fn(batch): """ This function helps when we have different number of object instances in the batches in the dataset. """ return tuple(zip(*batch))
Function for Image Transforms
Let’s define a function for the image transforms. We will call it as train_transform()
# function for the image transforms def train_transform(): return A.Compose([ A.Flip(0.5), # A.RandomRotate90(0.5), # A.MotionBlur(p=0.2), # A.MedianBlur(blur_limit=3, p=0.1), # A.Blur(blur_limit=3, p=0.1), ToTensorV2(p=1.0) ], bbox_params={'format': 'pascal_voc', 'label_fields': ['labels']})
We are just flipping the images horizontally with a probability of 0.5 and converting the images to tensor. You will see that there are a lot of commented transforms. Although we can use those, we will not use those in this tutorial. Using them will surely make the model much more robust. Using these transforms will have a slight impact on the training time. Be sure to train using these transforms on your own some time and tell about your findings in the comment section.
Prepare the Training Dataframe
Let’s prepare the training DataFrame now. Take a look at the following code block.
# path to the input root directory DIR_INPUT = config.ROOT_PATH # read the annotation CSV file train_df = pd.read_csv(f"input/train_df.csv") print(train_df.head()) print(f"Total number of image IDs (objects) in dataframe: {len(train_df)}") # get all the image paths as list image_paths = glob.glob(f"{DIR_INPUT}/Train Data/Positive data/*.JPG") image_names = [] for image_path in image_paths: image_names.append(image_path.split(os.path.sep)[-1].split('.')[0]) print(f"Total number of training images in folder: {len(image_names)}") image_ids = train_df['image_id'].unique() print(f"Total number of unique train images IDs in dataframe: {len(image_ids)}") # number of images that we want to train out of all the unique images train_ids = image_names[:] # use all the images for training train_df = train_df[train_df['image_id'].isin(train_ids)] print(f"Number of image IDs (objects) training on: {len(train_df)}")
Sometimes, in a dataset, we may have an image name in the CSV file, but that image may not be present in the image folder. To check for this we just need a single line of code. We are doing that at lines 18 and 19.
The Train Dataset and Train Data Loader
Finally, for the dataset preparation part, we need to initialize the PotHoleDataset()
and define the train data loader.
train_dataset = PotHoleDataset(train_df, DIR_INPUT, train_transform()) train_data_loader = DataLoader( train_dataset, batch_size=config.BATCH_SIZE, shuffle=False, collate_fn=collate_fn )
We are using the batch size from config.py
file, that is a batch size of 2. If you face OOM (Out of Memory) error while training, then reduce the batch size to 1.
Writing the Training and Some Helper Functions
In this section, we will write the training function and some helper functions along with that. The code in this section will go into the engine.py
Importing the modules first.
import cv2 import matplotlib.pyplot as plt import numpy as np import torch from model import model from dataset import train_data_loader
The two lines of code will initialize the device, the model, and define the optimizer that we will use.
# the computation device device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model = model().to(device) optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=0.0005)
We will use the SGD()
optimizer with a learning rate of 0.001, a momentum of 0.9, and a weight decay of 0.0005.
The Training Function
There are a few important points that we need to take care of while training the Faster RCNN ResNet50 FPN model. This is true for any of the PyTorch pre-trained deep learning object detectors. When we use these pre-trained models on our own dataset and fine-tune them, then we have to keep a few things in mind. We will go over these points after we write the function.
def train(train_dataloader): model.train() running_loss = 0 for i, data in enumerate(train_dataloader): optimizer.zero_grad() images, targets, images_ids = data[0], data[1], data[2] images = list(image.to(device) for image in images) targets = [{k: v.to(device) for k, v in t.items()} for t in targets] loss_dict = model(images, targets) loss = sum(loss for loss in loss_dict.values()) running_loss += loss.item() loss.backward() optimizer.step() if i % 25 == 0: print(f"Iteration #{i} loss: {loss}") train_loss = running_loss/len(train_dataloader.dataset) return train_loss
Explanation of the Training Function
- Lines 6, 7, and 8 are quite usual where we extract the images and the target labels from the data.
- Did you notice that we did not define any loss function before the training function? This is because, we need to provide the Faster RCNN ResNet50 object detector with both, the images and the target labels as well. Take a look at line 11. We directly get a loss dictionary in this line that saves in
. Now, if you printloss_dict
, then you will find something similar to this.
# output format of `loss_dict` in training mode {'loss_classifier': tensor(0.8491, device='cuda:0', grad_fn=<NllLossBackward>), 'loss_box_reg': tensor(0.0608, device='cuda:0', grad_fn=<DivBackward0>), 'loss_objectness': tensor(4.9780, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>), 'loss_rpn_box_reg': tensor(0.5585, device='cuda:0', grad_fn=<DivBackward0>)}
- We have a dictionary of different loss values with the keys indicating the type of loss. We have the image classification loss, the bounding box regression loss, the objectness loss, and the region proposal loss for Faster RCNN.
- Keeping this in mind, we add all the loss values at line 13. Then we add the batch loss to the
to keep track of the epoch-wise loss. - At lines 16 and 17, we backpropagate the gradients and update the model parameters.
- Also, we are printing the loss values every 25 iterations to keep a close track of our progress. This is because training for one epoch takes a long time and we should know whether the loss is actually decreasing or not.
- Finally, we are calculating the epoch-wise loss, that is
and returning it.
Some Helper Functions
First, let’s define the function to save the trained model.
def save_model(): torch.save(model.state_dict(), 'checkpoints/fasterrcnn_resnet50_fpn.pth')
The model will save inside the checkpoints
Next, we will define a function called visualize()
. This function will only execute if DEBUG=True
inside the config.py
script. If this function executes, then it will show some of the annotated training images before training begins.
def visualize(): """ This function will only execute if `DEBUG` is `True` in `config.py`. """ images, targets, image_ids = next(iter(train_data_loader)) images = list(image for image in images) targets = [{k: v for k, v in t.items()} for t in targets] for i in range(1): boxes = targets[i]['boxes'].cpu().numpy().astype(np.int32) sample = images[i].permute(1,2,0).cpu().numpy() fig, ax = plt.subplots(1, 1, figsize=(15, 12)) for box in boxes: cv2.rectangle(sample, (box[0], box[1]), (box[2], box[3]), (220, 0, 0), 3) ax.set_axis_off() plt.imshow(sample) plt.show()
The Training Script
The training script is going to be very simple and concise. We have already defined all the functions that we need. We just need to call those functions. The code in this part will go into the train.py
First, import the modules and libraries that we need.
import torch import matplotlib import matplotlib.pyplot as plt import time from model import model import config from engine import train, visualize, save_model from dataset import train_data_loader, train_dataset
Next, visualize the annotated training images, if DEBUG=True
in config.py
if config.DEBUG: visualize()
The next few lines of code train the Faster RCNN ResNet50 on our road pothole images.
num_epochs = config.EPOCHS for epoch in range(num_epochs): start = time.time() train_loss = train(train_data_loader) print(f"Epoch #{epoch} loss: {train_loss}") end = time.time() print(f"Took {(end - start) / 60} minutes for epoch {epoch}")
We just run a simple for
loop and print the loss after each epoch. Also, we print the time that it takes for the completion of one epoch.
Finally, we save the model trained model.
We have the training code ready. Now, it is time to execute train.py
Training Faster RCNN for Pothole Detection
Move to the project directory in your command line or terminal and execute the train.py
python train.py
If you have DEBUG=True
in the config.py
file, then first you will see some of the training images. I am skipping that part here. The following is the truncated output from the training.
mage_id num_potholes x y w h 0 G0010033 6 1990 1406 66 14 1 G0010033 6 1464 1442 92 16 2 G0010033 6 1108 1450 54 16 3 G0010033 6 558 1434 102 16 4 G0010033 6 338 1450 72 18 Total number of image IDs (objects) in dataframe: 4592 Total number of training images in folder: 1119 Total number of unique train images IDs in dataframe: 1337 Number of image IDs (objects) training on: 3896 Iteration #0 loss: 8.82939338684082 Iteration #25 loss: 0.40030747652053833 Iteration #50 loss: 0.6408292055130005 Iteration #75 loss: 0.47089526057243347 Iteration #100 loss: 0.1265372484922409 Iteration #125 loss: 0.251159131526947 Iteration #150 loss: 0.237876296043396 Iteration #175 loss: 0.5076833367347717 Iteration #200 loss: 0.458962082862854 Iteration #225 loss: 0.18618100881576538 Iteration #250 loss: 0.1883908063173294 Iteration #275 loss: 0.35126793384552 Iteration #300 loss: 0.17349722981452942 Iteration #325 loss: 0.4572589099407196 Iteration #350 loss: 0.3761522173881531 Iteration #375 loss: 0.3168582320213318 Iteration #400 loss: 0.6698653697967529 Iteration #425 loss: 0.11370620876550674 Iteration #450 loss: 0.09485868364572525 Iteration #475 loss: 0.2052663117647171 Iteration #500 loss: 0.6903306245803833 Iteration #525 loss: 0.1825105845928192 Iteration #550 loss: 0.1253437101840973 Epoch #0 loss: 0.18087337265278847 Took 14.302103877067566 minutes for epoch 0 Iteration #0 loss: 0.8553087711334229 Iteration #25 loss: 0.3376452624797821 ... Epoch #4 loss: 0.12217348455613259 Took 13.162632573102414 minutes for epoch 4
A single epoch takes somewhere around 13 to 14 minutes on a GTX 1060. Yours may take less or more time depending on the GPU that you have. By the end of 5 epochs, we have a loss value of 0.1221. This seems good enough for just 5 epochs. Still, we cannot say much until we test our model on the test images.
Inference for Pothole Detection Faster RCNN ResNet50 and PyTorch
In this section, we will write the code for testing our trained deep learning object detector on the test images.
All of this code will go into the test.py
The following are the imports that we need.
import numpy as np import cv2 import os import torch from tqdm import tqdm import config from model import model
Let’s set the computation device and load the trained model weights.
# set the computation device device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu') # load the model and the trained weights model = model().to(device) model.load_state_dict(torch.load('checkpoints/fasterrcnn_resnet50_fpn.pth'))
The following lines of code read all the image paths and stores those paths in a list called test_images
DIR_TEST = config.TEST_PATH test_images = os.listdir(DIR_TEST) print(f"Validation instances: {len(test_images)}")
Reading All of the Images and Detecting the Potholes in Them
Here, we will have a single block of code. We will loop over all of the image paths, read the images using OpenCV, and detect the potholes in each of them.
detection_threshold = config.PREDICTION_THRES model.eval() with torch.no_grad(): for i, image in tqdm(enumerate(test_images), total=len(test_images)): orig_image = cv2.imread(f"{DIR_TEST}/{test_images[i]}", cv2.IMREAD_COLOR) image = cv2.cvtColor(orig_image, cv2.COLOR_BGR2RGB).astype(np.float32) # make the pixel range between 0 and 1 image /= 255.0 image = np.transpose(image, (2, 0, 1)).astype(np.float) image = torch.tensor(image, dtype=torch.float).cuda() image = torch.unsqueeze(image, 0) cpu_device = torch.device("cpu") outputs = model(image) outputs = [{k: v.to(cpu_device) for k, v in t.items()} for t in outputs] if len(outputs[0]['boxes']) != 0: for counter in range(len(outputs[0]['boxes'])): boxes = outputs[0]['boxes'].data.numpy() scores = outputs[0]['scores'].data.numpy() boxes = boxes[scores >= detection_threshold].astype(np.int32) draw_boxes = boxes.copy() for box in draw_boxes: cv2.rectangle(orig_image, (int(box[0]), int(box[1])), (int(box[2]), int(box[3])), (0, 0, 255), 3) cv2.putText(orig_image, 'PotHole', (int(box[0]), int(box[1]-5)), cv2.FONT_HERSHEY_SIMPLEX, 2, (0, 0, 255), 2, lineType=cv2.LINE_AA) cv2.imwrite(f"test_predictions/{test_images[i]}", orig_image,) print('TEST PREDICTIONS COMPLETE')
Explanation of the Above Code Block
- First of all, we are defining the detection confidence threshold at line 1. At line 2, we are setting the model to
mode which is very important. - Starting from line 4, we loop over all the image paths and read the images for detection.
- We get the
at line 16. - At line 18, we move all the outputs onto the CPU.
- At line 19, we first check whether the
list is empty or not. We only move forward if the model has predicted the bounding box coordinates. - Then we extract the bounding box coordinates and the confidence scores. At line 23, we get hold of those bounding boxes only which have a higher score than the threshold.
- Starting from line 26, we loop over all the bounding boxes in an image and draw the rectangles using OpenCV. We also write the text ‘PotHole’ using OpenCV
function for easier interpretation.
Execute the test.py File
Now, we are ready to detect potholes in the images. Execute the test.py
script from the command line/terminal.
python test.py
You will see the output similar to this.
Validation instances: 628 1%|▉ | 9/628 [00:13<15:59, 1.55s/it]
Detecting on all the images will take some time to run. If you want to detect potholes only in a few images, then quit the program after a few iterations. You will have the detection output images inside the test_predictions
Analyzing the Detection Outputs
Let’s take a look at a few images that the Faster RCNN ResNet50 object detector has detected potholes in. There are more than 600 test images but we will take a look at just a few.
The Successful Detections

In this image, the Faster RCNN ResNet50 object detector detects the two potholes successfully. It is very clear that there are two potholes on the road which are clearly visible. This would have been easy for the detector.
But what about multiple potholes where the potholes are much smaller? Can the Faster RCNN ResNet50 detector detect those?

In figure 4, there are five potholes and two of them are small ones as well. Yet the Faster RCNN ResNet50 model is able to detect all of them successfully. Looks like our deep learning object detector has learned well.
Some Failed Detections
Now, let’s take a look at a few of the failed test cases. Take a look at the following image.

First of all, the Faster RCNN ResNet50 detector detects the pothole wrongly. It is detecting a patch of grass on the sidewalk as a pothole. And secondly, it is totally unable to detect the actual pothole in the middle of the road. I have marked it in the red circle with the text alongside it.
There are probably two main reasons for this failure. First, of all, the pothole in this road image is somewhat different. It looks like sand but if you zoom in, you will come to know that it is actually a pothole. This may have made it difficult for the Faster RCNN ResNet50 object detector to detect this pothole. Then again, we have trained the model for only 5 epochs. I am pretty sure that with more training it will able to detect this pothole successfully as well.
Do try more training on your own and tell about your results in the comment section. It will even help the others.
We will bring this tutorial to an end here.
Summary and Conclusion
In this article, you learned how to train the Faster RCNN ResNet50 FPN for pothole detection. We covered the basics that make up the groundwork of such a system. There are many more things to experiment with.
- We can try training for more epochs to get even better results.
- We can try using different backbones like Faster RCNN ResNet101 for pothole detection.
- Using Mask RCNN for both segmentation and detection of the potholes will also make it an even better project.
- There is also scope for making it a real-time system where we can detect potholes in videos. Although that would require some more work. I hope that you try this one too.
Do try to experiment with the above options. This will surely help you in your learning. I hope that I was successful to give you the groundwork to move further.
If you have any doubts, thoughts, or suggestions, then please leave them in the comment section. I will surely address them.
