Wild Cats Image Classification using Deep Learning

Updated: April 8, 2020.

You must have come across numerous tutorials to distinguish between cats and dogs using deep learning. Well, then this tutorial is going to be a bit different and a whole lot interesting. In this article, we too will be using deep learning with Keras and TensorFlow for image classification. But with one difference, we will be classifying between images of wild cats, namely, golden tigers, white tigers and, mountain lions.

You can go ahead and download the project file. After extracting the file you will have the data, the pre-trained model and, also the original Jupyter Notebook.

Download Project and Trained Models

The Directory Structure

It is not mandatory that you should use the same data as provided here. You can use your own data and by the end of this post, you will have a personal project ready. Just make sure that the folder structure of the data should be the same as provided. The following is the tree structure of the project directory.

├───.ipynb_checkpoints
├───data
│   ├───golden_tiger
│   ├───mountain_lion
│   └───white_tiger
├───images
└───logs
    ├───wild-cats-1560494714
    └───wild-cats-1560495009

We have the data directory and three sub-directories inside it. The sub-directories (golden_tiger, mountain_lion, white_tiger) contain the respective images of the wild cats. If you consider using your own data, then just replace the sub-directories with your own. The names of the sub-directories will act as labels. So, it will be better if you give intuitive names to those.

The images directory contains the test images that we will use after the model has been trained.

The logs directory contains the TensorBoard logs. We will be using TensorBoard to visualize the train and validation plots and also see the graph of the model. If you are going to use TensorBoard for the first time, then you may find this article useful.

Moreover, if you want to download new images for this tutorial, then please refer to my previous article – Create Your Own Deep Learning Image Dataset. Just one more thing. We will be using tf.keras module instead of the direct Keras API. If you find the model building codes too long, then feel free to use Keras directly. Okay, let’s start.

Loading Data and Preprocessing

The following block of code imports all the necessary packages.

# import packages
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
import random
import pickle
import time
import cv2
import os

from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from imutils import paths

The above code is mostly self-explanatory.

time module will help us to get a different timestamp for each of the TensorBoard logs so that the previous ones do not get replaced.
Using cv2 we will read the images, make necessary changes before feeding the image to the model and also visualize the predictions.
We will use LabelBinarizer from Scikit-Learn to get the image labels.

Moving on, let’s get the image paths:

# path to the images
data_path = 'data'

images = []
image_labels = []

# get the image paths inside the `data` directory
image_paths = sorted(list(paths.list_images(data_path)))
# for reproducibility
random.seed(42)
# shuffle the images
random.shuffle(image_paths)

We initialize two lists, images and image_labels which will hold our images and the labels of the images respectively. For this project, we have three labels, golden_tiger, white_tiger and mountain_lion. Next, we use random.seed() for reproducibility, in case we need to run the code again. Finally, we shuffle all the images.

Now, let’s read the images and resize them using cv2. We will also append each image to the images list and the labels to the image_labels list.

for image_path in image_paths:
    # load and resize the images
    image = cv2.imread(image_path)
    image = cv2.resize(image, (128, 128))
    images.append(image)
    
    # get the labels from the image paths and store in `image_labels`
    image_label = image_path.split(os.path.sep)[-2]
    image_labels.append(image_label)

It is a good idea to rescale the image pixels. Rescaling is a very important factor when considering the accuracy and performance of neural networks.

# rescale the image pixels 
images = np.array(images, dtype='float') / 255.0
# make `image_labels` as array
image_labels = np.array(image_labels)

Moving on, we will now split the data into train and validation test. We will be using 80% of the data for training and 20% for validation.

# divide the image data into train set and test set
(train_X, test_X, train_y, test_y) = train_test_split(images, image_labels, 
                                                      test_size=0.2, 
                                                      random_state=42)

The image_labels is a list of strings. To use it properly during training and inference (prediction) we need to convert those into one-hot labels.

# one-hot encode the labels
lb = LabelBinarizer()
train_y = lb.fit_transform(train_y)
test_y = lb.fit_transform(test_y)

So, now for golden_tiger, we will have [1, 0, 0]. This means that the image is true for golden_tiger only (hence, the 1). Similarly, it will be [0, 1, 0], [0, 0, 1] for mountain_lion and white_tiger respectively.

Next, we will insert the code image augmentation generator. This will provide our model with a lot of different cases to learn while training. Hopefully, this step will help in achieving better accuracy.

# generator for image augmentation
image_aug = tf.keras.preprocessing.image.ImageDataGenerator(rotation_range=30, shear_range=0.2, 
    zoom_range=0.2, height_shift_range=0.2, width_shift_range=0.2, 
    horizontal_flip=True, fill_mode='nearest')

For data augmentation, we are using the ImageDataGenerator from tf.keras module.

Building the Model

Now, its time to build our model. Frankly, the model does have a lot of layers. Let’s insert the code fist, then I will move on to explain the layers.

# build the model
model = tf.keras.models.Sequential()
input_shape = (128, 128, 3)

model.add(tf.keras.layers.Conv2D(32, (3, 3), padding='same', 
            activation='relu', input_shape=input_shape))
model.add(tf.keras.layers.BatchNormalization(axis=-1))      
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tf.keras.layers.Dropout(0.2))
        
model.add(tf.keras.layers.Conv2D(64, (3, 3), padding='same', 
                                 activation='relu'))
model.add(tf.keras.layers.BatchNormalization(axis=-1))
model.add(tf.keras.layers.Conv2D(64, (3, 3), padding='same', 
                                 activation='relu'))
model.add(tf.keras.layers.BatchNormalization(axis=-1))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tf.keras.layers.Dropout(0.2))

model.add(tf.keras.layers.Conv2D(128, (3, 3), padding='same', 
                                 activation='relu'))
model.add(tf.keras.layers.BatchNormalization(axis=-1))
model.add(tf.keras.layers.Conv2D(128, (3, 3), padding='same', 
                                 activation='relu'))
model.add(tf.keras.layers.BatchNormalization(axis=-1))
model.add(tf.keras.layers.Conv2D(128, (3, 3), padding='same', 
                                 activation='relu'))
model.add(tf.keras.layers.BatchNormalization(axis=-1))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tf.keras.layers.Dropout(0.2))

model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(512, activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(0.2))

model.add(tf.keras.layers.Dense(len(lb.classes_), activation='softmax'))

First of all, we are using the Sequntial() API to build the model. Secondly, the we have given the input shape as (128, 128, 3) for the height, width and channels. So, it is a channels last input shape.

Now, moving on to the actual layers, we are using Conv2D with relu activation function and padding=same. For the first convolutional layer we have given the output dimensionality as 32. Then we gradually increase it to 64 and 128. Each of the Conv2D layer is having its own BatchNormalization layer as well. The axis is -1 as we are using channels last. Then we have the MaxPooling layer with pool_size=(2, 2) and Dropout layer with a rate of 0.2.

In the model, the only exceptions are the last hidden layer and the output layer. For the last hidden layer we are using Dense with 512 dimensionality. The output layer is using softmax activation function and the output dimensionality is len(lb.classes). That would amount to 3 as we have 3 classes in total.

Now, we are going to create a TensorBoard callback to save the training logs.

NAME = 'wild-cats-{}'.format(int(time.time())) # to save different tensorboard logs each time
 
tensorboard = tf.keras.callbacks.TensorBoard(log_dir='logs/{}'.format(NAME))

Compiling and Running the Model

In this part, we are going to define the optimizer, compile the model and train it using fit_generator. We will be using fit_generator() instead of fit() as we have an ImageDataGenerator() in the pipeline for augmenting the images.

optimizer = tf.keras.optimizers.Adam(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer, 
              metrics=['accuracy'])

history = model.fit_generator(image_aug.flow(train_X, train_y, 
                                             batch_size=32), 
                              validation_data=(test_X, test_y), 
                              steps_per_epoch=len(train_X) // 32, 
                              epochs=50, callbacks=[tensorboard])

Saving the Model

It is always a good idea to save the model weights after training. Here, we will be saving the trained model as well as the labels as a pickle file.

model.save('conv2d.model')
f = open('conv2d_lb.pickle', 'wb')
f.write(pickle.dumps(lb))
f.close()

Plotting and TensorBoard Visualization

Before we move on to the inference stage let’s see all the accuracies and losses in a single plot. That would help in comparing them in a better way.

num_epochs = np.arange(0, 50)
plt.figure(dpi=300)
plt.plot(num_epochs, history.history['loss'], label='train_loss', c='red')
plt.plot(num_epochs, history.history['val_loss'], 
    label='val_loss', c='orange')
plt.plot(num_epochs, history.history['acc'], label='train_acc', c='green')
plt.plot(num_epochs, history.history['val_acc'], 
    label='val_acc', c='blue')
plt.title('Training Loss and Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Loss/Accuracy')
plt.legend()
plt.savefig('plot.png')

Graph plot for the trained model — Model Plot

The validation loss does not look very stable. There can be numerous reasons for this. One of the reasons is the similarity between the images and also the small number of images. You can also get a TensorBoard visualization of the plots and the model graph as well. In the command line (in the project directory), type the following:

tensorboard --logdir=logs/

Then go to http://localhost:6006/#scalars .

You can also click on the GRAPHS tab to see the model graph.

Image for tensorboard plot — Tensorboard plots

Image for tensorboard graph — Tensorboard graph

Predicting

The code that follows can be run fully independently without running any of the previous code after you have completed the training part. So, you can always predict on new images only by executing the following code snippets. If you want, you can also save this part in a predict.py python file which you can run independently without opening the Jupyter Notebook.

Let’s load one of the test images, scale the pixels and reshape it as well:

# load the test image
image = cv2.imread('images/golden.jpg')
output = image.copy()
image = cv2.resize(image, (128, 128))

# scale the pixels
image = image.astype('float') / 255.0

image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))

Loading the model and the label binarizer.

model = tf.keras.models.load_model('conv2d.model')
lb = pickle.loads(open('conv2d_lb.pickle', 'rb').read())

Now, predicting and getting the class labels.

# predict
preds = model.predict(image)

# get the class label
max_label = preds.argmax(axis=1)[0]
print('PREDICTIONS: \n', preds)
print('PREDICTION ARGMAX: ', max_label)
label = lb.classes_[max_label]

PREDICTIONS: [[9.2370689e-01 2.5450445e-06 7.6290570e-02]] 
PREDICTION ARGMAX: 0

Finally, let’s see the predictions.

# class label along with the probability
text = '{}: {:.2f}%'.format(label, preds[0][max_label] * 100)
cv2.putText(output, text, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)
cv2.imshow('image', output)
cv2.waitKey(0)

Image predicted by the model — Prediction

Our model is predicting the image as golden_tiger with 92.3% accuracy, which is fairly good. But when you predict for the other images, you may sometimes get wrong results. One reason is that our data set is fairly small (only about 350 images in total for each class). You can always try to find more images for better training. Other times you may find that if white tiger image has a yellowish hue (maybe sunrise), then also the model is predicting the image as golden_tiger or maybe mountain_lion. Adding a variety of images covering different scenes appears to be the only solution here. You should surely try out these.

Conclusion

In this tutorial, you learned how to build your own image classifier using TensorFlow and Keras. From here on, you can try training the model on different images.

Again, you can download the files here: