Traffic Sign Detection using PyTorch and Pretrained Faster RCNN Model

In the last post, we carried out traffic sign recognition on the GTSRB dataset. Here, in this post, we will take a step further. We will carry out traffic sign detection using PyTorch and pretrained Faster RCNN models.

From the last post, it was pretty clear that a complete traffic sign recognition system requires two components. The classification and detection of the traffic signs. In the real world, any autonomous vehicle, first detects where a traffic sign is (detection/localization) and applies recognition to it (image classification). Again, this post is not about replicating any component of autonomous driving. It is just about knowing the very basic procedures and concepts through a toy object detection project. But hopefully, it will turn out to be both fun and inspiring.

Traffic sign detection using PyTorch and Faster RCNN
Figure 1. One of the inference results from traffic sign detection using PyTorch and Faster RCNN. We will train a model that is capable of such detections.

And obviously, for traffic sign detection using PyTorch, we will not be building anything from scratch. We will leverage existing pretrained models and utilities to create an end-to-end pipeline for traffic sign detection. In fact, we will try to keep it as modular as possible so that you can switch with any dataset in the future and just change the dataset path. Specifically, we will use the Faster RCNN model for detection here. We will fine-tune a pretrained MobileletNetV3 Large Faster RCNN model and check out the inference performance on both images and videos.

This is the second post in the traffic sign recognition and detection series.

Topics to Cover

We will cover the following topics in this post:

  • We will start with the exploration of the dataset that we will use for traffic sign detection. This is the GTSDB dataset.
  • During the coding phase, we will focus on a few very important things. These inlcude:
    • The dataset preparation in the correct format. This is the step before creating the PyTorch dataset and PyTorch data loaders.
    • Then we will discuss the configuration Python file.
    • In regard to the training and inference dataset, we will mostly focus on the augmentations that we apply for object detection here.
    • Then we will discuss the training script in brief.
  • After the training completes, we will discuss the inference code. Then carry out inference on images and videos.
  • Finally, we will end the post by discussing some of the further steps that we can take to make this project even better.

The GTSDB Dataset

We will use the German Traffic Sign Detection Benchmark dataset for traffic sign detection using PyTorch and Faster RCNN in this post. This dataset was mainly created for researchers who wanted to take on the task of an image-based driver assistance system. Although a bit old, this dataset will perfectly fit our purpose of learning more about object detection and testing different Faster RCNN models on it.

This dataset is closely related to the German Traffic Sign Recognition Benchmark dataset. The GTSDB dataset contains the same classes as GTSRB. And it was also part of the IJCNN (International Joint Conference on Neural Networks) 2013.

The following are a few important details about the dataset:

  • It contains a total of 900 images. 600 of them are training images and 300 are for evaluation.
  • It contains 43 classes, out of which the following are a few:
    • 0 = speed limit 20 (prohibitory)
    • 1 = speed limit 30 (prohibitory)
    • 22 = uneven road (danger)
    • 23 = slippery road (danger)
    • 41 = restriction ends (overtaking) (other)
    • 42 = restriction ends (overtaking (trucks)) (other)

You can easily explore all the images and classes after downloading the entire dataset.

Also, to get a sense of the type of the objects and the classes, the following figure shows a few traffic signs along with the annotations.

Traffic sign images with ground truth annotations.
Figure 2. Traffic sign images with ground truth annotations.

As we can see, some of the signs are pretty difficult to recognize as they are very small. Only a really good object detection model will be able to perform well on this dataset. In this post, we will check out the performance of two different Faster RCNN models. This will give us a good idea of what kind of detector is better suited for such a dataset.

Downloading the Dataset

There are a few mandatory dataset files that you need to download via this link. You need to download the,, and gt.txt files. You can visit the link to download them or click on the following direct download links:

There are a few other files that we need but you will get direct access to them when downloading the zip file for this post. And we will also process and prepare the training and validation images out of the original images which are part of the coding section in this post.

The Original Dataset Structure

For now, let’s take a look at the original structure of the downloaded files. This is after extracting the downloaded zip files.

├── 00
├── 01
├── 41
├── 42
├── 00000.ppm
├── 00001.ppm
├── 00002.ppm
├── 00597.ppm
├── 00598.ppm
├── 00599.ppm
├── ex.txt
├── gt.txt
└── ReadMe.txt
└── TestIJCNN2013Download
    ├── 00000.ppm
    ├── 00001.ppm
    ├── 00298.ppm
    ├── 00299.ppm
    └── ReadMe.txt

The TrainIJCNN2013 directory first contains the 43 class folders which contain a few sample images belonging to the respective classes. But all the images starting from 00000.ppm to 00599.ppm are present directly inside the TrainIJCNN2013. Along with that, it also contains gt.txt file which holds the ground truth annotations and classes in the following format:


All the attributes are separated by semi-colons. The first one is the image/file name and the next four are the bounding box coordinates in the x_min, y_min, x_max, and y_max format. In other words, they are the top-left and bottom-right coordinates of the traffic signs in a particular image. The final attribute is the class number. The ReadMe.txt file contains the mapping of the class number to the class names along with a few other information.

The TestIJCNN2013 directory directly contains the 300 test images without any ground truth information. This is because the test set results were meant to be submitted to the competition site for evaluation. But we will use these images for inference after training the model.

The Project Directory Structure

Now, let’s take a look at the entire directory structure for this post. This will give us a better idea on how to arrange each file and folder.

├── inference_outputs
│   ├── images [275 entries exceeds filelimit, not opening dir]
│   └── videos
│       └── video_1_trimmed_1.mp4
├── input
│   ├── inference_data
│   │   ├── video_1.mp4
│   │   └── video_1_trimmed_1.mp4
│   ├── TestIJCNN2013
│   │   └── TestIJCNN2013Download [301 entries exceeds filelimit, not opening dir]
│   ├── TrainIJCNN2013 [646 entries exceeds filelimit, not opening dir]
│   ├── train_images [425 entries exceeds filelimit, not opening dir]
│   ├── train_xmls [425 entries exceeds filelimit, not opening dir]
│   ├── valid_images [81 entries exceeds filelimit, not opening dir]
│   ├── valid_xmls [81 entries exceeds filelimit, not opening dir]
│   ├── all_annots.csv
│   ├── classes_list.txt
│   ├── gt.txt
│   ├── MY_README.txt
│   ├── signnames.csv
│   ├── train.csv
│   └── valid.csv
├── outputs
│   ├── last_model.pth
│   └── train_loss.png
├── src
│   ├── models
│   │   ├──
│   │   └──
│   ├── torch_utils
│   │   ├──
│   │   ├──
│   │   ├──
│   │   ├──
│   │   └──
│   ├──
│   ├──
│   ├──
│   ├──
│   ├──
│   ├──
│   ├──
│   ├──
│   └──

Okay! There are a lot of things to cover here and a few important ones too. So, let’s go through them.

  • We have already seen a lot of content in the input directory in the previous section. In short, it contains all the data related files and folders. The inference_data subdirectory contains the video file that we will use for inference. The classes_list.txt file contains all the class names in a text format for easier management of the all the class names. signnames.csv contains the class number and class name mappings in CSV format. We will generate the train_images, train_xmls, valid_images, valid_xmls, all_annots.csv, train.csv, and valid.csv through the data preparation scripts. That we will cover in the coding section.
  • The outputs and inference_outputs directories will contain the training and inference results respectively.
  • Now coming to the src directory. Mostly, we will cover all the content of this in the coding section. Still, just to have a brief idea, the following are the scripts and Python files we have:
    • The models subdirectory contains the code to load two different Faster RCNN models.
    • The torch_utils subdirectory contains the different utility scripts such as COCO mAP calculation scripts, training and validation fucntions (, and other helper functions ( Most of the code in this subdirectory has been borrowed from the original PyTorch detection repository and slightly modified according to our use case.
    • Other Python files directly in the src directory are custom written. These consist of training, inference, and data preparation scripts.

You will get access to all the code files, the trained model, outputs, and a few of the files in the input directory when downloading the zip file for this post.

Libraries and Frameworks

The two major libraries for this project are PyTorch and Albumentations. All the code has been developed using PyTorch 1.10.0 and Albumentations 1.1.0. Newer versions of these two libraries should not cause any issues as well.

Starting from this section, we will start discussing the coding part of the post. By now, you must have realized that there are a lot of Python files accompanying this post. You will surely get access to all the code files from the download section. But we will be discussing only the very important part of the code here. While discussing these sections, we may not go into the very details of the code or even write the code in this post. But surely, there are a few files for which we will even write and discuss the code in detail.

So, let’s get started with it.

Dataset Preprocessing and Creating XML Files

Further on, we will see that our PyTorch datasets and PyTorch data loaders accept images and corresponding XML files containing the annotations. Right now, the original images are present in the TrainIJCNN2013 and TestIJCNN2013. Only the TrainIJCNN2013 contains ground truth labels. So, we will divide that into a training and validation set.

To create all the required files, we will follow a set of scripts in the src directory. We have to execute the following scripts one after the other to get all the files that we need.



Right now, the ground truth labels of all the training images are in the gt.txt file. But we need them in CSV format so that we can create training and validation splits easily. The script will help us with this.

It will take the paths to the image folders and ground truth text file, which are the TrainIJCNN2013, and gt.txt respectively and create a CSV file. This CSV file will contain the image names, the image size, the bounding box coordinates, and the class names.

The following is content for

Script to creat a CSV annotation files for all the images in a given folder
and given text file.
The text file here is TrainIJCNN2013/gt.txt, so the code is according to that.

import pandas as pd
import cv2

def text_to_csv(txt_file_name, csv_file_name):
    # Class names.
    sign_names_df = pd.read_csv('../input/signnames.csv')
    class_names = sign_names_df.SignName.tolist()

    with open(txt_file_name) as f:
        all_lines = f.readlines()

    all_lines = [line.split('\n')[0] for line in all_lines]

    file_name = []
    x_min = []
    y_min = []
    x_max = []
    y_max = []
    class_name = []
    width = []
    height = []
    for line in all_lines:
        all_elements = line.split(';')
        image = cv2.imread(f"../input/TrainIJCNN2013/{all_elements[0]}")
        img_height, img_width, _ = image.shape

    csv_file = pd.DataFrame(columns=[
        'file_name', 'width', 'height', 
        'class_name', 'x_min', 'y_min', 'x_max', 'y_max'
    csv_file['file_name'] = file_name
    csv_file['x_min'] = x_min
    csv_file['x_max'] = x_max
    csv_file['y_min'] = y_min
    csv_file['y_max'] = y_max
    csv_file['class_name'] = class_name
    csv_file['width'] = width
    csv_file['height'] = height

    csv_file.to_csv(f"../input/{csv_file_name}", index=False)

text_to_csv('../input/TrainIJCNN2013/gt.txt', 'all_annots.csv')

It is a very simple script and quite self-explanatory.

Execute the following code from the src directory.


This should create the all_annots.csv file in the input directory with the following content structure.

A few rows from the traffic sign dataset CSV file.
Figure 3. A few rows from the traffic sign dataset CSV file.

As you can see, it contains all the information for images and corresponding ground truth.


Now, we will execute the script that will create a train.csv and valid.csv file in the input directory by randomly splitting the all_annots.csv file.

Script to create a training and validation CSV file.

import pandas as pd
import shutil
import os

def train_valid_split(all_images_folder=None, all_annots_csv=None, split=0.15):
    all_df = pd.read_csv(all_annots_csv)
    # Shuffle the CSV file rows.
    len_df = len(all_df)
    train_split = int((1-split)*len_df)

    train_df = all_df[:train_split]
    valid_df = all_df[train_split:]

    os.makedirs('../input/train_images', exist_ok=True)
    os.makedirs('../input/valid_images', exist_ok=True)

    # Copy training images.
    train_images = train_df['file_name'].tolist()
    for image in train_images:
    train_df.to_csv('../input/train.csv', index=False)

    # Copy validation images.
    valid_images = valid_df['file_name'].tolist()
    for image in valid_images:
    valid_df.to_csv('../input/valid.csv', index=False)


Execute the script from the src directory.


Executing Script

This is the final data processing script. This will create the XML files for training and validation which will contain all the information and annotations for a particular image.

The content of has been inspired from this code repository and modified according to our use case:

You can execute the script using the following command.


This will create two directories, train_xmls and valid_xmls containing the training and validation XML files. The following block shows the content of one such XML file.

        <name>Right-of-way at the next intersection</name>

According to the current dataset splitting, we are using 15% of the data for validation and the rest for training. This amounts to 425 training examples and 81 validation examples.

Python Files in torch_utils Directory

mAP (Mean Average Precision) is one of the most common metrics to evaluate object detection models. But implementing the code for mAP from scratch can be tricky at times. Here, we will not be reinventing the wheel and implementing them from scratch. Instead, we will take help from the very reliable PyTorch detection repository and make a few changes according to our task. The same goes for the training and evaluation functions.

The torch_utils directory holds four Python files. Let’s go over them in brief.

  • This contains the code for AP (Average Precision) and AR (Average Recall) calculation according to the MS COCO dataset standard.
  • This Python file contains the helper functions and classes that are used in the validation loop of to initialize the COCO API.
  • We also need classes and functions for logging of the metrics and calculation of losses. The contains the code for that.
  • This is like the training driver Python file. It contains the functions for training and validation loops. Those are train_one_epoch() and evaluate() functions.

Be sure to take a look at the above file before moving ahead. This will provide much more clarity on the internal working of the object detection code.

Model Files in the models Directory

You may notice that the models directory contains three Python files. Each of these three contains the create_model() function that will load the respective PyTorch Faster RCNN model for traffic sign detection.

  • This will load fasterrcnn_resnet50_fpn() model upon calling the function. Out of three model code here, this is perhaps the best model in terms of mAP, yet the slowest in terms of FPS. This is because the ResNet50 backbone is quite large for it to be real-time.
  • This loads the Faster RCNN model with the MobileNetV3 Large FPN backbone model. In terms of speed, this is much faster than the ResNet50 Faster RCNN model but inferior in terms of performance (mAP).
  • This model is quite similar to the above one but internally resizes the images to a lower resolution, that is 320 pixels. It is the fastest of the three but also does not perform very well on smaller objects and when not having enough data for training.

For this post, primarily we will train the fasterrcnn_mobilenet_v3_large_fpn model. This should give us a good balance between accuracy and speed and will also allow us to perform inference on videos with good enough FPS. But as the code contains two other models, you can just switch the models any time you want and experiment quite easily on your own. Just as a heads up, the MobileNet 320 FPN one will not perform very well with this amount of data and also because the traffic signs are too small. But feel free to experiment with different augmentation techniques and try to improve the accuracy.

The File to Prepare PyTorch Datasets and Data Loaders

In the src directory, the contains the custom dataset class and functions to load the training and validation data loaders. In its current state, you can load any object detection dataset that you want. You just need to make sure that you have the images and XML files we had prepared earlier. Also, the custom dataset class supports loading different image formats like JPG, JPEG, PNG, and PPM.

We will not go into the details of the custom dataset class here. But it is very similar to the one described in this post. So, if you are interested in getting an in-depth explanation, please visit the link.

The File

In short, the contains a lot of helper functions and classes. These range from:

  • Helper functions to save the model.
  • To save images.
  • Annotate images with bounding boxes and text.
  • And even load the training and validation transforms and augmentation.

Out of all these, we will go through the training transforms and augmentations in detail here. The following code block contains the get_train_transform() function which transforms and augments the training data.

# define the training tranforms
def get_train_transform():
    return A.Compose([
        A.MotionBlur(blur_limit=3, p=0.2),
        A.Blur(blur_limit=3, p=0.1),
            brightness_limit=0.2, p=0.5
    ], bbox_params={
        'format': 'pascal_voc',
        'label_fields': ['labels']

We are applying four different augmentations here. They are:

  • MotionBlur: To replicate the effect of an image taken from a speeding vehicle.
  • Blur: Randonly blurring the images.
  • RandomBrightnessContrast: To introduce different brightness and contrast intensities.
  • ColorJitter: To change the color of the images.

The above augmentations should introduce enough variability into the training dataset to train a robust model. If you intend to change the augmentations or add new ones, you need to be a bit careful. Some of the traffic signs are already pretty small and adding unnecessary augmentations can make the dataset a bit too difficult to learn. For example, we are not adding any flipping here. This is because flipping of traffic signs can render the entire object meaningless. Especially, flipping of the speed signs should be completely avoided.

After adding the augmentations, we are converting the images to tensors, and also adding the same set of augmentations to the bounding boxes (if applicable) using the bbox_params argument.

The Configuration ( File

Next is the file. This will define all the training configurations that we need. Let’s take a look at the content of the file first.

import torch

BATCH_SIZE = 4 # increase / decrease according to GPU memeory
RESIZE_TO = 512 # resize the image for training and transforms
NUM_EPOCHS = 200 # number of epochs to train for

DEVICE = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
# Images and labels direcotry should be relative to
TRAIN_DIR_IMAGES = '../input/train_images'
TRAIN_DIR_LABELS = '../input/train_xmls'
VALID_DIR_IMAGES = '../input/valid_images'
VALID_DIR_LABELS = '../input/valid_xmls'

# classes: 0 index is reserved for background
    'Speed limit (20km/h)', 'Speed limit (30km/h)', 'Speed limit (50km/h)', 
    'Speed limit (60km/h)', 'Speed limit (70km/h)', 'Speed limit (80km/h)', 
    'End of speed limit (80km/h)', 'Speed limit (100km/h)', 
    'Speed limit (120km/h)', 'No passing', 
    'No passing for vehicles over 3.5 metric tons', 
    'Right-of-way at the next intersection', 'Priority road', 'Yield', 
    'Stop', 'No vehicles', 'Vehicles over 3.5 metric tons prohibited', 
    'No entry', 'General caution', 'Dangerous curve to the left', 
    'Dangerous curve to the right', 'Double curve', 'Bumpy road', 
    'Slippery road', 'Road narrows on the right', 'Road work', 
    'Traffic signals', 'Pedestrians', 'Children crossing', 
    'Bicycles crossing', 'Beware of ice/snow', 'Wild animals crossing', 
    'End of all speed and passing limits', 'Turn right ahead', 
    'Turn left ahead', 'Ahead only', 'Go straight or right', 
    'Go straight or left', 'Keep right', 'Keep left', 'Roundabout mandatory', 
    'End of no passing', 'End of no passing by vehicles over 3.5 metric tons'


# whether to visualize images after creating the data loaders

# location to save model and plots
OUT_DIR = '../outputs'

As we can see, we define almost all the training configurations here. We have the batch size, the image size for resizing during dataset processing, the number of epochs to train for, and the number of workers as well.

Then we have the paths for training and validation images and also the XML files. We have the CLASSES list defining all the class names. Note that for Faster RCNN model training, the first class has to be the __background__ class that is present in our list as well.

There is another VISUALIZE_TRANSFORMED_IMAGES constant. If this is True, the program will show a few transformed images before the training begins. This is just to check what kind of images the model sees during training. The code for this is in the file.

Finally, the OUT_DIR defines where the loss plots and trained models will be saved.

The Script

There is just one other Python file we need to deal with before we can begin the traffic sign detection training using PyTorch and Faster RCNN.

It is the script. This is the executable driver script.

The next code block contains the import statements for

from torch_utils.engine import (
    train_one_epoch, evaluate
from config import (
from datasets import (
    create_train_dataset, create_valid_dataset, 
    create_train_loader, create_valid_loader
from models.fasterrcnn_mobilenetv3_large_fpn import create_model
from custom_utils import (
    Averager, show_tranformed_image

import torch

As discussed earlier, we will train the MobileNetV3 Large FPN Faster RCNN model.

Now, we have the rest of the code under the main block.

if __name__ == '__main__':
    train_dataset = create_train_dataset()
    valid_dataset = create_valid_dataset()
    train_loader = create_train_loader(train_dataset, NUM_WORKERS)
    valid_loader = create_valid_loader(valid_dataset, NUM_WORKERS)
    print(f"Number of training samples: {len(train_dataset)}")
    print(f"Number of validation samples: {len(valid_dataset)}\n")


    # Initialize the Averager class.
    train_loss_hist = Averager()
    # Train and validation loss lists to store loss values of all
    # iterations till ena and plot graphs for all iterations.
    train_loss_list = []

    # Initialize the model and move to the computation device.
    model = create_model(num_classes=NUM_CLASSES)
    model =
    # Total parameters and trainable parameters.
    total_params = sum(p.numel() for p in model.parameters())
    print(f"{total_params:,} total parameters.")
    total_trainable_params = sum(
        p.numel() for p in model.parameters() if p.requires_grad)
    print(f"{total_trainable_params:,} training parameters.\n")
    # Get the model parameters.
    params = [p for p in model.parameters() if p.requires_grad]
    # Define the optimizer.
    # optimizer = torch.optim.SGD(params, lr=0.001, momentum=0.9, weight_decay=0.0005)
    optimizer = torch.optim.AdamW(params, lr=0.0001, weight_decay=0.0005)

    # LR will be zero as we approach `steps` number of epochs each time.
    # If `steps = 5`, LR will slowly reduce to zero every 5 epochs.
    steps = NUM_EPOCHS + 25
    scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(

    for epoch in range(NUM_EPOCHS):

        _, batch_loss_list = train_one_epoch(

        evaluate(model, valid_loader, device=DEVICE)

        # Add the current epoch's batch-wise lossed to the `train_loss_list`.

        # Save the current epoch model.
        save_model(OUT_DIR, epoch, model, optimizer)

        # Save loss plot.
        save_train_loss_plot(OUT_DIR, train_loss_list)

Let’s focus on the important stuff here.

  • First is the optimizer. We are using the AdamW optimizer instead of SGD here. From the experiments, I found that the AdamW optimizer combined with CosineAnnealingWarmRestarts gave slightly lower loss and seems to improve the inference predictions also.
  • On similar line, we are using the CosineAnnealingWarmRestarts for learning rate scheduling here. You may notice that we are training for 200 epochs, but the first learning rate restart will happen at epoch number 225. This is just to ensure that the learning rate keeps on reducing by the same amount with each iteration and stays slightly above 0 the last few epochs. This strategy seems to improve the mAP a bit.
  • After that we have the for loop starting the training and evaluation for 200 epochs.
  • After every epoch we save the current model and the training loss plot to disk.

Although we did not go into much of the coding details here, still, it was a lot to cover in terms of the entire training pipeline. Now, we are all set to start the training for traffic sign detection using PyTorch and Faster RCNN.

Executing for Traffic Sign Detection using PyTorch and Faster RCNN

Open your terminal/command line in the src directory and execute the script. If you are training locally, make sure that you have a GPU. Even if you cannot train the model, you have access to the trained model when you download the zip file for the post. So, you can run the inference without any issues.


The following is the truncated output from the terminal.

Number of training samples: 425
Number of validation samples: 81

19,145,479 total parameters.
19,086,583 training parameters.

Epoch     0: adjusting learning rate of group 0 to 1.0000e-04.
Epoch: [0]  [  0/107]  eta: 0:01:26  lr: 0.000001  loss: 4.0942 (4.0942)  loss_classifier: 3.9772 (3.9772)  loss_box_reg: 0.0659 (0.0659)  loss_objectness: 0.0467 (0.0467)  loss_rpn_box_reg: 0.0044 (0.0044)  time: 0.8070  data: 0.2524  max mem: 966
Epoch: [0]  [100/107]  eta: 0:00:00  lr: 0.000101  loss: 0.4165 (0.5722)  loss_classifier: 0.2571 (0.4177)  loss_box_reg: 0.1338 (0.1322)  loss_objectness: 0.0123 (0.0175)  loss_rpn_box_reg: 0.0044 (0.0048)  time: 0.0637  data: 0.0056  max mem: 1183
Epoch: [0]  [106/107]  eta: 0:00:00  lr: 0.000100  loss: 0.4658 (0.5756)  loss_classifier: 0.3119 (0.4180)  loss_box_reg: 0.1559 (0.1359)  loss_objectness: 0.0110 (0.0169)  loss_rpn_box_reg: 0.0043 (0.0049)  time: 0.0598  data: 0.0051  max mem: 1183
Epoch: [0] Total time: 0:00:07 (0.0706 s / it)
creating index...
index created!
Test:  [ 0/21]  eta: 0:00:03  model_time: 0.0253 (0.0253)  evaluator_time: 0.0019 (0.0019)  time: 0.1899  data: 0.1577  max mem: 1183
Test:  [20/21]  eta: 0:00:00  model_time: 0.0234 (0.0230)  evaluator_time: 0.0019 (0.0019)  time: 0.0305  data: 0.0041  max mem: 1183
Test: Total time: 0:00:00 (0.0398 s / it)
Averaged stats: model_time: 0.0234 (0.0230)  evaluator_time: 0.0019 (0.0019)
Accumulating evaluation results...
DONE (t=0.05s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.022
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.041
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.030
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.015
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.178
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.051
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.052
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.052
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.042
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.178
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Epoch: [199]  [  0/107]  eta: 0:00:31  lr: 0.000004  loss: 0.2822 (0.2822)  loss_classifier: 0.1035 (0.1035)  loss_box_reg: 0.1786 (0.1786)  loss_objectness: 0.0001 (0.0001)  loss_rpn_box_reg: 0.0000 (0.0000)  time: 0.2901  data: 0.2335  max mem: 1183
Epoch: [199]  [100/107]  eta: 0:00:00  lr: 0.000003  loss: 0.2186 (0.2498)  loss_classifier: 0.0370 (0.0491)  loss_box_reg: 0.2090 (0.2003)  loss_objectness: 0.0001 (0.0002)  loss_rpn_box_reg: 0.0000 (0.0001)  time: 0.0546  data: 0.0052  max mem: 1183
Epoch: [199]  [106/107]  eta: 0:00:00  lr: 0.000003  loss: 0.2169 (0.2474)  loss_classifier: 0.0388 (0.0486)  loss_box_reg: 0.1840 (0.1984)  loss_objectness: 0.0001 (0.0002)  loss_rpn_box_reg: 0.0001 (0.0001)  time: 0.0528  data: 0.0050  max mem: 1183
Epoch: [199] Total time: 0:00:06 (0.0571 s / it)
creating index...
index created!
Test:  [ 0/21]  eta: 0:00:04  model_time: 0.0243 (0.0243)  evaluator_time: 0.0023 (0.0023)  time: 0.2298  data: 0.2007  max mem: 1183
Test:  [20/21]  eta: 0:00:00  model_time: 0.0221 (0.0217)  evaluator_time: 0.0026 (0.0024)  time: 0.0308  data: 0.0050  max mem: 1183
Test: Total time: 0:00:00 (0.0431 s / it)
Averaged stats: model_time: 0.0221 (0.0217)  evaluator_time: 0.0026 (0.0024)
Accumulating evaluation results...
DONE (t=0.06s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.210
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.323
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.212
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.227
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.534
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.293
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.300
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.300
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.283
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.583
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000

After 200 epochs, the mAP for 0.5 IoU is 32.3% and for for IoU=0.50:0.95 is 21.0%. This is not very great but not too terrible either. If you remember, the objects are small and quite difficult as well. And the model backbone is MobileNetV3.

The following is the iteration-wise loss graph that is saved to disk.

Training loss after training the traffic sign detection faster rcnn model.
Figure 4. Loss after training the PyTorch Faster RCNN model.

The training loss was decreasing till the end of training. Perhaps a few more epochs of training would have helped as well. But let’s use this model for now to run the inference.

Inference using the Trained Faster RCNN Model

First, we will run inference on images, then inference on videos. We have two different scripts for this. One is for running inference on images and for running inference on videos.

Both the image inference and video inference were run on a machine with RTX 3080 GPU (10 GB), 32 GB RAM, and a 10th generation i7 processor.

Running Inference on Images for Traffic Sign Detection using PyTorch

Before running the script, be sure to take a few minutes and go through the script. A few important features of the script:

  • It can either take path to a directory containing images or even a single image file. We can provide either of two from the command line using the --input flag.
  • We can also provide an integer using the --resize flag. Use this either to resize the image to a larger or smaller size. For example, providing 300 will resize it to 300×300.
  • Also, it supports multiple image formats for a directory path which include, JPG, JPEG, PNG, and PPM.

We will be carrying out inference on the test images present in the input/TestIJCNN2013/TestIJCNN2013Download directory.

Execute the following command within the src directory.

python --input ../input/TestIJCNN2013/TestIJCNN2013Download

The average FPS on all the images was around 73 which is pretty good considering it is a Faster RCNN model. This is most likely due to the lighter MobileNetV3 Large backbone.

Let’s take a look at some of the predictions.

Traffic sign detection inference results.
Figure 5. Traffic sign detection inference results.

We can see that the model predicts traffic signs like Keep right, Speed limit (50km/h), and Speed limit (30km/h) correctly most of the time.

However, it is making mistakes if the traffic sign is too small or if they are similar like Slippery road, Beware of ice/snow.

It might be just that the MobileNetV3 Large backbone is not good enough for this task. The small number of training examples can also be an issue. Still, let’s move ahead with the video inference.

Running Inference on Videos

While executing the, we need to provide the path to the input video file. Also, it supports the same resizing format as the image inference script.

Execute the following for running inference on video.

python --input ../input/inference_data/video_1_trimmed_1.mp4
Clip 1. Traffic sign detection video inference using MobileNetV3 Large FPN Faster RCNN model.

The inference on video is good only when the traffic sign is too close. Shadows and smaller traffic signs are resulting in wrong predictions. Also, when the traffic signs are far away, we can see a lot of fluctuations. This clearly shows the limitation of using a smaller backbone like MobileNetV3.

Results using ResNet50 FPN Faster RCNN Model

Just for the sake of comparison, the following are the results using fasterrcnn_resnet50_fpn model.

Number of training samples: 425
Number of validation samples: 81

41,514,411 total parameters.
41,292,011 training parameters.

Epoch     0: adjusting learning rate of group 0 to 1.0000e-04.
/home/sovit/miniconda3/envs/torch110/lib/python3.9/site-packages/torch/ UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  /opt/conda/conda-bld/pytorch_1634272204863/work/aten/src/ATen/native/TensorShape.cpp:2157.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Epoch: [0]  [  0/107]  eta: 0:03:04  lr: 0.000001  loss: 4.1653 (4.1653)  loss_classifier: 4.0820 (4.0820)  loss_box_reg: 0.0356 (0.0356)  loss_objectness: 0.0462 (0.0462)  loss_rpn_box_reg: 0.0014 (0.0014)  time: 1.7221  data: 0.2689  max mem: 3007
Epoch: [0]  [100/107]  eta: 0:00:01  lr: 0.000101  loss: 0.3237 (0.5056)  loss_classifier: 0.1659 (0.3279)  loss_box_reg: 0.1106 (0.1345)  loss_objectness: 0.0145 (0.0326)  loss_rpn_box_reg: 0.0046 (0.0106)  time: 0.2310  data: 0.0051  max mem: 3433
Epoch: [0]  [106/107]  eta: 0:00:00  lr: 0.000100  loss: 0.3030 (0.4956)  loss_classifier: 0.1771 (0.3202)  loss_box_reg: 0.0941 (0.1332)  loss_objectness: 0.0167 (0.0318)  loss_rpn_box_reg: 0.0046 (0.0104)  time: 0.2231  data: 0.0049  max mem: 3433
Epoch: [0] Total time: 0:00:26 (0.2437 s / it)
creating index...
index created!
Test:  [ 0/21]  eta: 0:00:05  model_time: 0.1001 (0.1001)  evaluator_time: 0.0048 (0.0048)  time: 0.2562  data: 0.1471  max mem: 3433
Test:  [20/21]  eta: 0:00:00  model_time: 0.0980 (0.0949)  evaluator_time: 0.0041 (0.0042)  time: 0.1050  data: 0.0047  max mem: 3433
Test: Total time: 0:00:02 (0.1139 s / it)
Averaged stats: model_time: 0.0980 (0.0949)  evaluator_time: 0.0041 (0.0042)
Accumulating evaluation results...
DONE (t=0.07s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.027
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.060
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.013
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.027
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.113
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.089
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.122
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.122
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.119
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.272
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Epoch: [199]  [  0/107]  eta: 0:00:46  lr: 0.000004  loss: 0.0027 (0.0027)  loss_classifier: 0.0019 (0.0019)  loss_box_reg: 0.0008 (0.0008)  loss_objectness: 0.0000 (0.0000)  loss_rpn_box_reg: 0.0000 (0.0000)  time: 0.4340  data: 0.2040  max mem: 3433
Epoch: [199]  [100/107]  eta: 0:00:01  lr: 0.000003  loss: 0.0074 (0.0080)  loss_classifier: 0.0027 (0.0033)  loss_box_reg: 0.0034 (0.0044)  loss_objectness: 0.0000 (0.0001)  loss_rpn_box_reg: 0.0001 (0.0002)  time: 0.2263  data: 0.0051  max mem: 3433
Epoch: [199]  [106/107]  eta: 0:00:00  lr: 0.000003  loss: 0.0058 (0.0080)  loss_classifier: 0.0025 (0.0033)  loss_box_reg: 0.0029 (0.0044)  loss_objectness: 0.0000 (0.0001)  loss_rpn_box_reg: 0.0001 (0.0002)  time: 0.2185  data: 0.0050  max mem: 3433
Epoch: [199] Total time: 0:00:24 (0.2274 s / it)
creating index...
index created!
Test:  [ 0/21]  eta: 0:00:06  model_time: 0.0972 (0.0972)  evaluator_time: 0.0027 (0.0027)  time: 0.3000  data: 0.1973  max mem: 3433
Test:  [20/21]  eta: 0:00:00  model_time: 0.0961 (0.0930)  evaluator_time: 0.0023 (0.0023)  time: 0.1009  data: 0.0041  max mem: 3433
Test: Total time: 0:00:02 (0.1130 s / it)
Averaged stats: model_time: 0.0961 (0.0930)  evaluator_time: 0.0023 (0.0023)
Accumulating evaluation results...
DONE (t=0.07s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.522
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.636
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.624
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.523
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.611
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.613
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.663
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.663
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.663
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.661
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000

The loss is lower in this case and the mAP is also higher. This is expected as ResNet50 is a much larger backbone.

Further Steps

Further on, you can try running inference after training the ResNet50 FPN Faster RCNN model. You will surely get better results and be sure to share them in the comment section.

You can also try training the fasterrcnn_mobilenet_v3_large_320_fpn model and checking how it is performing.

Summary and Conclusion

We covered a lot of things for traffic sign detection using PyTorch and Faster RCNN in this post. We started with the exploration of the Python code files and then trained a Faster RCNN model with MobileNetV3 Large FPN backbone. During the inference, we got to know why the model is not able to perform very well and also that the ResNet50 Faster RCNN gives better mAP. In future posts, we will explore more such detections projects along this line. I hope that you learned something new from this post.

If you have any doubts, thoughts, or suggestions, please leave them in the comment section. I will surely address them.

You can contact me using the Contact section. You can also find me on LinkedIn, and Twitter.

Credits for Video Used for Inference

  • Trimmed version of video_1.mp4 in input/inference_data:
