Apple Scab Detection using PyTorch Faster RCNN


Apple Scab Detection using PyTorch Faster RCNN

When dealing with plant pathology using deep learning, sometimes, image recognition is not enough. One such example is the identification of the apple scab disease. The last two blog posts covered the recognition of apple scab in leaves and fruits. In the apple fruit, pinpointing where the actual disease is, will be more helpful. Extending our “deep learning for plant pathology” series, in this tutorial, we will use PyTorch Faster RCNN for apple scab detection.

Using object detection for apple scab detection will solve two problems for us that image recognition could not.

  • When using image recognition for apple scab identification, the model was only able to tell whether the leaf or fruit has been affected by the disease or not. Then we had to analyze the image manually to check which part of the leaf or fruit the disease is affecting.
  • When using object detection for apple scab detection, the model will already give us the localization predictions. We just need to look at those specific places and confirm whether the apple scab disease is present or not. This is going to be much more convenient.

An example of apple scab detection using PyTorch and Faster RCNN
Figure 1. An example of apple scab detection using PyTorch and Faster RCNN. We will train a Faster RCNN ResNet50 FPN V2 model in this tutorial which will be capable to carry out such detections.

Let’s check out the topics that we will cover in this tutorial:

  • We will start with exploring the apple scab detection dataset.
  • Then we will move to the discussion of the code and model that we will use for object detection.
  • We will use the Faster RCNN ResNet FPN V2 PyTorch model for training.
  • After the training, we will analyze the results.
  • Next, we will carry out inference using some videos from the internet for apple scab detection.
  • Finally, we will finish the tutorial by discussing the improvements that we can add to this project.

Let’s jump into the details without any further delay.

Preparing the Apple Scab Detection Dataset

We start with the AppeScabFDs dataset from Kaggle to prepare the apple scab detection dataset.

We also used this dataset for apple fruit scab recognition in the previous post.

But as we know, this dataset only supports image classification out of the box.

So, how do we prepare it for detection?

We have to go through the manual and tedious process of annotating the image ourselves. But for this project, the annotation of the images has already been done.

LabelImg was used to annotate the scab regions of the apples (207 images) and create the annotation files in Pascal VOC XML format.

Annotating apple scab images using LabelImg
Figure 2. Annotating apple scab images using LabelImg.

It is important to note that we only use the images under the Scab directory because in object detection we only need the true positives while annotating the images.

The final dataset is available here on Kaggle. The dataset has already been split into 183 training images and 24 validation images.

The following are a few images with the ground truth annotations from the final training set.

Apple cab images with the ground truth annotations.
Figure 3. Apple scab images with the ground truth annotations.

You may go ahead and download the dataset for now. In the next section, we will check how to structure the entire project directory.

Directory Structure

The following block shows the directory structure for the entire project.

├── data
│   ├── apple_scab_detection
│   │   ├── train_images
│   │   ├── train_labels
│   │   ├── valid_images
│   │   └── valid_labels
│   └── inference_data
│       ├── video_1.mp4
│       └── video_1_trimmed.mp4
├── data_configs
│   └── apple_scab.yaml
├── datasets.py
├── inference.py
├── inference_video.py
├── models
│   ├── create_fasterrcnn_model.py
│   ├── fasterrcnn_resnet50_fpn.py
│   ├── fasterrcnn_resnet50_fpn_v2.py
│   └── __init__.py
├── notebooks
│   └── visualizations.ipynb
├── outputs
│   ├── inference
│   │   └── res_1
│   └── training
│       └── epochs50_allaug
├── random_split.py
├── README.md
├── requirements.txt
├── torch_utils
│   ├── coco_eval.py
│   ├── coco_utils.py
│   ├── engine.py
│   ├── __init__.py
│   ├── README.md
│   └── utils.py
├── train.py
└── utils
    ├── annotations.py
    ├── general.py
    ├── __init__.py
    ├── logging.py
    └── transforms.py
  • The data directory contains the apple_scab_detection subdirectory with the training & validation images as well as the XML annotation files. We also have an inference_data subdirectory with any images or videos that we will like to carry out inference on.
  • The data_config directory contains a YAML file holding all the dataset and class information.
  • Other than that we have several scripts for training, inference, and utility functions.
  • The train.py is the driver script that we will use for training.

All the code files along with the proper structure are available for download. To run the training locally, you just need to download the dataset and arrange it in the above structure.

The Object Detection Codebase

We are not going into too many details about the object detection codebase here. This is a smaller part of a much larger project that you can find here on GitHub.

We have removed all the parts that we do not need for the apple scab detection training.

Still, if you want to get into the details, the repository is the best place to do so.

You can also go through the Fine Tuning Faster RCNN ResNet50 FPN V2 post to get a few more details.

Download Code

Installing the Requirements

If you are on Linux, you can just install the requirements from the requirements.txt file to get started right away.

pip install -r requirements.txt

For Windows OS, you may need to follow these instructions so that you can set up the pycocotools (for mAP calculation) correctly.

The Augmentations Applied to the Apple Scab Detection Dataset

The dataset does not contain too many images. We have just 183 images for training and 24 for validation.

We have to get the most out of what we have. Using extensive image augmentation can introduce enough variability into the dataset.

Fortunately, the code that we use supports mosaic augmentations and several other albumentations based augmentations.

Take a look at the following image. It shows how a single image looks when 4 images are combined using mosaic augmentation and all the other augmentations are also applied.

Mosaic augmentation applied to the apple scab images during training.
Figure 4. Mosaic augmentation applied to the apple scab images during training.

In mosaic augmentation, we combine four images by cropping them in different ratios. This has proved to be very beneficial for training object detection models.

Training the Faster RCNN ResNet FPN V2 model for Apple Scab Detection

Let’s get into the training part of this tutorial.

All the training and inference experiments in this tutorial have been conducted on a machine with 10 GB RTX 3080 GPU, 32 GB of RAM, and an i7 10th generation processor.

The training and inference scripts have been arranged in a very simple manner.

To start the training, execute the following command within the project directory.

python train.py \
--model fasterrcnn_resnet50_fpn_v2 \
--batch 4 \
--epochs 50 \
--use-train-aug \
--data data_configs/apple_scab.yaml \
--name epochs50_allaug

Let’s go over all the command line arguments.

  • --model: This takes a key value to inform the training script which model to use. This code base supports fasterrcnn_resnet50_fpn_v2 (new model) and fasterrcnn_resnet50_fpn. Here, we are using the newer Faster RCNN ResNet50 FPN V2 model as it gives better results.
  • --batch: The training and validation batch size here is 4. This consumes around 8 GB of VRAM with 640×640 resolution training images. If you are training locally and face OOM (Out Of Memory) error, please consider reducing the batch size to 2.
  • --epochs: We are training the model for 50 epochs.
  • --use-train-aug: By default, the training script uses mosaic augmentation. Along with that, we are also passing this boolean flag that tells the script to use the additional augmentations.
  • --data: This flag accepts the path to the dataset YAML file. The YAML file contains the path to images, annotation, class names, and the number of classes.
  • --name: We can provide an explanatory project name using this flag. This will create a subdirectory under outputs/training directory using the same name. It helps to easily differentiate between different training experiments.

The following is the truncated output from the final epoch.

Epoch: [49]  [ 0/45]  eta: 0:02:46  lr: 0.001000  loss: 0.5461 (0.5461)  loss_classifier: 0.1044 (0.1044)  loss_box_reg: 0.2099 (0.2099)  loss_objectness: 0.0374 (0.0374)  loss_rpn_box_reg: 0.1944 (0.1944)  time: 3.6905  data: 3.3475  max mem: 5192
Epoch: [49]  [44/45]  eta: 0:00:00  lr: 0.001000  loss: 0.4430 (0.4158)  loss_classifier: 0.0815 (0.0777)  loss_box_reg: 0.1998 (0.1888)  loss_objectness: 0.0209 (0.0365)  loss_rpn_box_reg: 0.1209 (0.1128)  time: 0.7624  data: 0.4315  max mem: 5192
Epoch: [49] Total time: 0:00:39 (0.8671 s / it)
Test:  [0/6]  eta: 0:00:05  model_time: 0.1414 (0.1414)  evaluator_time: 0.0038 (0.0038)  time: 0.9233  data: 0.7348  max mem: 5192
Test:  [5/6]  eta: 0:00:00  model_time: 0.1357 (0.1267)  evaluator_time: 0.0046 (0.0055)  time: 0.2663  data: 0.1250  max mem: 5192
Test: Total time: 0:00:01 (0.2734 s / it)
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.37435535204021597
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.7532503816294729
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.36091233591814287
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.3300101898839323
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.6437478033517637
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.7267326732673267
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.09318181818181817
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.38636363636363635
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.421969696969697
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.37345132743362836
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.7066666666666667
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.725

But this was not the best epoch. The following are the mAP results from the best epoch, epoch 43 (the terminal outputs epoch number starting from 0, so, 0 to 49 in our case).

Epoch: [42]  [ 0/45]  eta: 0:02:42  lr: 0.001000  loss: 0.3858 (0.3858)  loss_classifier: 0.0758 (0.0758)  loss_box_reg: 0.1718 (0.1718)  loss_objectness: 0.0257 (0.0257)  loss_rpn_box_reg: 0.1126 (0.1126)  time: 3.6103  data: 3.2708  max mem: 5192
Epoch: [42]  [44/45]  eta: 0:00:00  lr: 0.001000  loss: 0.3600 (0.3937)  loss_classifier: 0.0736 (0.0759)  loss_box_reg: 0.1703 (0.1731)  loss_objectness: 0.0260 (0.0334)  loss_rpn_box_reg: 0.0799 (0.1113)  time: 0.7689  data: 0.4379  max mem: 5192
Epoch: [42] Total time: 0:00:38 (0.8563 s / it)
Test:  [0/6]  eta: 0:00:05  model_time: 0.1430 (0.1430)  evaluator_time: 0.0046 (0.0046)  time: 0.9220  data: 0.7387  max mem: 5192
Test:  [5/6]  eta: 0:00:00  model_time: 0.1367 (0.1274)  evaluator_time: 0.0047 (0.0058)  time: 0.2666  data: 0.1256  max mem: 5192
Test: Total time: 0:00:01 (0.2740 s / it)
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.3861075990416181
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.7561515998822166
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.3918785457208387
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.3409863155683963
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.6417609618104668
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.6937293729372936
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.10378787878787879
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.39696969696969703
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.4340909090909091
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.3893805309734513
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.7
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.7 

We have the best model in the outputs/training/epochs50_allaug directory according to the best mAP object detection metrics. Along with that, the training script also stores several loss and mAP graphs.

Analyzing the Results

The following are the mAP graphs from the validation loop.

Mean Average Precision after training on the apple scab dataset.
Figure 5. Mean Average Precision after training on the apple scab dataset.

From the mAP graph (both, at 0.50 IoU and at 0.50:0.95 IoU), we can also see some indications of overfitting at the end.

However, this should not concern us much. Given the small size of the dataset, this is expected. Further, as we are saving the best model weights, we will use those for inference. So, we need not worry about any overfit model issue.

During training, the script also stores the predictions of a few images from the validation data on each epoch. Let’s take a look at them first before moving to the inference part.

Predicted images from the validation loop.
Figure 6. Apple scab detection predicted images from the validation loop.

In the above image, the model is detecting the apple scab regions correctly, apart from one image. In one of the images, the model is detecting the scab disease on the leaf.

Inference on Images for Apple Scab Detection

Carrying out inference on images is pretty easy. We can use the inference.py and just provide the directory path where all the images are present.

There are four images present in the data/inference_data directory which were neither part of the training set nor the validation set. They are random images from the internet.

We can run inference on these images by executing the following command.

python inference.py \
--weights outputs/training/epochs50_allaug/best_model.pth \
--input data/inference_data/

The following are the results.

Image inference results for apple scab detection
Figure 7. Image inference results for apple scab detection.

The first image (top-left) has a very blurry apple at the back. Still, the model was able to detect the scab disease on it. It is really impressive.

But for the image in the top-right corner, the model is making a few mistakes. It is detecting some scab disease regions on the leaves at the corners.

The next image (bottom-right) is pretty easy and the model is detecting all the scab regions pretty well.

In the final image (bottom-left) also the model predicts all the regions affected by the scab disease correctly.

Looks like even though we had very few images for training, the Faster RCNN ResNet50 FPN V2 is quite capable of learning all the necessary features.

Inference on Videos using the Trained Faster RCNN ResNet50 FPN V2

For the final testing of the model, we will run inference on a quite challenging video from YouTube.

It is a very fast-moving video with most of the frame out of focus. The original video has been trimmed down so that apples with scab diseases are present in most of the frames.

To run inference on videos, we need to execute the following command. You can provide the path to your own video to run inference on videos.

python inference_video.py \
--weights outputs/training/epochs50_allaug/best_model.pth \
--input data/inference_data/video_2_part1.mp4 \
--show
Clip 1. Video inference results for apple scab detection using PyTorch and Faster RCNN ResNet50 FPN V2.

There are two major observations here:

  • The model is able to detect almost all of the scab disease regions on the apples.
  • Yet, the number of false positives is high. The model is detecting patches on leaves as well as on the background which look like the apple scab disease.

What may be the reason for the second observation? Remember that we are using mosaic augmentation which crops the images and stitches them together. In this process, some of the background information from the original image is lost. When training on the mosaic images, the model does not get to see blurry backgrounds with patches which it should not focus on.

Moreover, the objects that we want to detect here are quite small. In most real-life scenarios the background will cover a major area. This shows one case where mosaic augmentation does not work very well in object detection. Most probably, training just with the simple object detection augmentations applied to the entire image will be more helpful for the model. But we are keeping that experiment for a future post.

Further Improvements

We can take this project a notch up. Along with predicting the bounding boxes for the apple scab disease on the fruits, we can do so for the leaves as well. This will again require annotating a lot of images of apple leaves with scab disease.

But after the entire project is done, it will be really nice to have a single model for apple scab disease detection of both leaves and fruits.

Summary and Conclusion

In this blog post, we carried out apple scab detection using the PyTorch Faster RCNN ResNet50 FPN V2 model. We started with the dataset discussion, moved on to the code base discussion, and then trained the model. We also ran inference on images and videos which gave us more insights. This experiment also proved that all the augmentation techniques may not be beneficial when trying to detect small objects with a lot of background. Hopefully, this was a good learning experience for you.

If you have any doubts, thoughts, or suggestions, please leave them in the comment section. I will surely address them.

You can contact me using the Contact section. You can also find me on LinkedIn, and Twitter.

Images / Videos Credits Used for Inference

Liked it? Take a second to support Sovit Ranjan Rath on Patreon!
Become a patron at Patreon!

3 thoughts on “Apple Scab Detection using PyTorch Faster RCNN”

  1. Yacine says:

    Great tutorial.
    How did you write all the code for this tutorial? I found it too complex!

    1. Sovit Ranjan Rath says:

      Hello Yacine. I have been developing the codebase (repository) that I have mentioned and referred to here for a long time. It is a bit large but contains all the things that are needed in a complete object detection pipeline. So, it is a bit complicated, not going to lie about that.

      But in general, I love to code deep learning stuff and that’s I started this repository. I also use it in a lot of my projects.

      1. Yacine says:

        Ah, I see! wow that’s impressive. good job 🙂

        Also, I am impressed with the high quality of your articles, keep going.

Leave a Reply

Your email address will not be published. Required fields are marked *