Introduction to Computer Vision with Deep Learning

In this article, we will go through image classification using deep learning. Image classification is a sub-field of computer vision. There have been a lot of advances in deep learning using neural networks. And because of that computer vision has seen many applications and advances in recent years.

Image classification, image recognition, object detection and localization, and image segmentation are some of those impacted areas.

As computer vision is a very vast field, image classification is just the perfect place to start learning deep learning using neural networks. We will try to cover as much of basic grounds as possible to get you up and running and make you comfortable in this topic.

Overview

The following is a brief overview of what we will be covering in this article:

A brief about deep learning and neural networks.
Using Dense Neural Network Layers for image classification.
Using Convolutional Neural Networks for image classification.

Basically, we will cover two neural network deep learning methods to carry out image classification. This will help you better understand the underlying architectural details in neural networks and how they work.

We will use the Keras library in this tutorial which is very convenient and easy to use.

The Dataset

For the dataset, we will use the Fashion MNIST dataset which is very beginner-friendly. You can visit the GitHub repository here. The dataset contains 60000 training examples and 10000 test examples. Each example is a 28×28 grayscale image.

As we will be using Keras, we can directly download the dataset from the Keras library.

The fashion items in the dataset belong to the following categories.

Label	Class
0 1 2 3 4 5 6 7 8 9	T-shirt/top Trouser Pullover Dress Coat Sandal Shirt Sneaker Bag Ankle boot

You can see that each of the fashion item has a corresponding label from 0 to 9.

Installing Keras and TensorFlow

Before moving further, if you need to install Keras library, then execute the following command in your terminal:

pip install keras

Keras is a high level API and we will be using TensorFlow as the backend. To install TensorFlow, execute the following command:

pip install tensorflow

If your system is having an NVidia GPU, then you can also install the GPU version of TensorFlow using the following command:

pip install tensorflow-gpu

Note: A GPU is not strictly necessary for this tutorial. But training will be faster when using GPU. If you want, you can execute all the code in this tutorial in Google Colab.

Deep Learning and Neural Networks

In the past, traditional machine learning techniques have been used for image classification. But neural networks, and mainly Convolutional Neural Networks (thanks to Yann LeCun) totally changed how we deal with computer vision and deep learning today.

In this tutorial, we will be using two different types of layers for image classification. First, we will use the Keras Dense layers and in the second approach, we will use the Convolutional Neural Network (CNN). Keras provides Conv2D to implement CNN very easily.

Now, you are all set to follow along with the code. If you want, you can type along as you follow.

Load the Libraries

First, we will load all the required libraries and modules.

import numpy as np
import keras
import matplotlib.pyplot as plt

from keras.layers import Dense, Flatten, Conv2D, MaxPooling2D
from keras import Sequential
from keras.datasets import fashion_mnist

Load and Prepare the Data

Now, as we can download and load the Fashion MNIST data from the Keras library.

(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
print(x_train.shape)
print(x_test.shape)
print(y_train.shape)
print(y_test.shape)

We have the following output after executing the above code block.

(60000, 28, 28)
(10000, 28, 28)
(60000,)
(10000,)

In x_train, we have 60000 examples with the pixel values of images arranged in a 28×28 matrix. And for y_train, there are 60000 labels ranging from 0 to 9. Similarly for x_test and y_test, which contain 10000 examples and corresponding labels respectively.

The pixel values of the images range from 0.0 to 255.0 and they are all in uint8 format. Neural networks are difficult to train when the values differ so much in their range. Therefore, we will scale the pixels values so that they lie in the range [0.0, 1.0]. Also, converting the values to float64 format will result in faster training.

x_train, x_test = (x_train / 255.0).astype('float'), (x_test / 255.0).astype('float')

Visualize the Images

It will be a lot easier to analyze the data if we visualize the images in the dataset. First, let’s create a list containing all the fashion item names. This will help us to apply labels to the images in the code.

names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 
        'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

The following block of code generates a plot of the first 9 images in the dataset along with their corresponding names.

plt.figure(figsize=(6, 6))
for i in range(9):
    plt.subplot(3, 3, i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.xlabel(names[y_train[i]])
    plt.imshow(x_train[i], cmap='binary')

plt.savefig('fashion-plot.png')
plt.show()

If you have worked with MNIST handwritten digits before, then you can find a some similarity here. Still, it is a good change and provides just enough complexity to tackle a new type of problem.

Image Classification using Dense Layers

In this section, we will Keras Dense() layers to build our neural network. To stack up the layers we will use the Sequential() model.

Build the Model

Let’s start by stacking up the layers to build our model.

model_dense = keras.Sequential()
model_dense.add(Flatten(input_shape=(28, 28)))
model_dense.add(Dense(16, activation='relu'))
model_dense.add(Dense(32, activation='relu'))
model_dense.add(Dense(64, activation='relu'))
model_dense.add(Dense(128, activation='relu'))
model_dense.add(Dense(256, activation='relu'))
model_dense.add(Dense(10, activation='softmax'))

Here is a brief analysis of the above code.

First, we initialize the Keras Sequential() model. Then, we use Flatten() which takes input_shape(28, 28) as a parameter. We have observed before that the pixels values are 28×28 matrices. After using Flatten(), the shape changes to (784,). This type of dimension is ideal input for Dense() layers.

After that, we have a Dense() layer with 16 units as the output dimension and relu activation function. We repeat the stacking of such Dense() layers with relu 4 more times till 256 units as the output dimension.

The last Dense() layer has 10 units and softmax activation. We use 10 units as the output can be any one of the class labels from 0 to 9.

We can also print a summary of our model which will give us the parameter details.

print(model_dense.summary())

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten_1 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 16)                12560     
_________________________________________________________________
dense_2 (Dense)              (None, 32)                544       
_________________________________________________________________
dense_3 (Dense)              (None, 64)                2112      
_________________________________________________________________
dense_4 (Dense)              (None, 128)               8320      
_________________________________________________________________
dense_5 (Dense)              (None, 256)               33024     
_________________________________________________________________
dense_6 (Dense)              (None, 10)                2570      
=================================================================
Total params: 59,130
Trainable params: 59,130
Non-trainable params: 0
_________________________________________________________________
None

In the next section, we are going to compile and train the model

Compile and Train the Model

For compiling the model, we will use adam optimizer and sparse_categorical_crossentropy as the loss. We will monitor the accuracy metric while training.

model_dense.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

Now, we are all set to fit our model. The next snippet of code handles the training of the model.

history = model_dense.fit(x_train, y_train, 
                    epochs=10)

Epoch 9/10
60000/60000 [==============================] - 7s 122us/step - loss: 0.3135 - acc: 0.8832
Epoch 10/10
60000/60000 [==============================] - 8s 134us/step - loss: 0.3076 - acc: 0.8853

By the end of the 10\(^{th}\) epoch, we are getting around 88% accuracy.

In the above code, history will store training accuracy and loss values for all epochs, which is 10 in our case. To access the training accuracy and loss values, we can use the following code.

acc = history.history['acc']
loss = history.history['loss']

Using the above data we can plot our training accuracy and loss graphs using matplotlib. That will give us a better insight into our results.

Accuracy and Loss Plots

num_epochs = np.arange(0, 10)
plt.plot(num_epochs, loss, label='train_loss', c='red')
plt.plot(num_epochs, acc, label='train_acc', c='green')

plt.title('Training Loss and Accuracy using Dense Layers')
plt.xlabel('Epoch')
plt.ylabel('Loss/Accuracy')
plt.legend()
plt.savefig('plot1.png')

We can see that the loss is decreasing with the increase in the number of epochs and the accuracy is increasing. This is a good sign and shows that our model is working as expected. But what about testing our model on unseen data? After all, we want to see how well our model performs during the test case scenario. For that, we can use evaluate() and get the loss and accuracy scores during testing.

prediction_scores = model_dense.evaluate(x_test, y_test)
print('Accuracy: ', prediction_scores[1]*100)
print('Loss: ', prediction_scores[0])

prediction_scores is a list and it stores two values, the first one is the test loss and the second one is the test accuracy. We can access those values using list indices as we normally do. The above code snippet will output the following:

10000/10000 [==============================] - 1s 51us/step
Accuracy:  87.1
Loss:  0.3684681357383728

We have a test accuracy of 87.1%. We can obviously do better. In the next section, we will use Convolutional Neural Networks and try to increase our test accuracy.

Image Classification using CNN

We have seen how Dense() layers work in Keras. Now we will train on the same dataset but using Conv2D(), which is the Keras implementation of CNN.

CNNs are specially used for computer-vision based deep learning tasks and they work better than other types of architectures for image-based operations.

While using Dense() layers we had to flatten the input. But CNNs take input in a bit different manner. The input shape to a CNN must be of the form (width, height, channel). width and height are common to any 2D image. But what about the channel ?

Well, the channel can be either 1 or 3. If the channel is 1, then it shows that it is a grayscale image. In the grayscale image, each pixel is a different intensity of the color gray.

When the channel is 3, then it shows that it is a colored image composed of three colors, red, green, blue.

In our case, all the images are grayscale images and therefore, the channel is going to be 1. Now, let’s reshape our training and testing data to the ideal input shape for CNN.

# reshape the inputs
x_train = x_train.reshape(x_train.shape[0], x_train.shape[1], x_train.shape[1], 1) 
x_test = x_test.reshape(x_test.shape[0], x_test.shape[1], x_test.shape[1], 1)

Now, as we are done with reshaping our data, we can move on to build our model using Sequential().

Build the CNN Model

model_cnn = Sequential()
model_cnn.add(Conv2D(32, input_shape=(28, 28, 1), kernel_size=(3, 3), strides=(2, 2), padding='same', activation='relu'))
model_cnn.add(Conv2D(64, kernel_size=(3, 3), strides=(2, 2), padding='same', activation='relu'))
model_cnn.add(MaxPooling2D(2, 2))
model_cnn.add(Conv2D(128, kernel_size=(3, 3), strides=(2, 2), padding='same', activation='relu'))
model_cnn.add(MaxPooling2D(2, 2))
model_cnn.add(Conv2D(256, kernel_size=(3, 3), strides=(2, 2), padding='same', activation='relu'))
model_cnn.add(Flatten())
model_cnn.add(Dense(10, activation='softmax'))

The first layer is a Conv2D() with 32 output dimensionality. The input_shape is (28, 28, 1) as we have discussed above. We can see three new parameters here, they are, kernel_size, strides and padding. Let’s see what each of them does.

kernel_size: this specifies the size of the 2D convolution window in the form of height and width. We have given the window size to be 3×3.
strides: we use strides to specify how many rows and columns we skip between each convolution.
padding: this is a string which can be either valid or same. In our case, we have used padding='same'.

The following image shows 3×3 kernel size with 2×2 strides.

Next, MaxPooling2D is used to downsample the representations where we have given a pool_size of 2×2 as input. This helps to reduce overfitting and also reduces the number of parameters resulting in faster convergence.

Finally, we flatten the inputs and use a Dense() layer with 10 units for each of the 10 labels.

Compile and Train the Model

The compiling and training part of the model is going to be similar to what we have seen earlier. We will use the same parameters for compiling as in the case of Dense() layer training.

model_cnn.compile(optimizer='adam', 
              loss='sparse_categorical_crossentropy', 
              metrics=['accuracy'])

history = model_cnn.fit(x_train, y_train, 
                    epochs=10)

Epoch 9/10
60000/60000 [==============================] - 8s 136us/step - loss: 0.1686 - acc: 0.9364
Epoch 10/10
60000/60000 [==============================] - 8s 138us/step - loss: 0.1539 - acc: 0.9421

By the end of 10 epochs, we have around 94% training accuracy which is much higher than in the case of Dense() layers.

Accuracy and Loss Plots

acc = history.history['acc']
loss = history.history['loss']

num_epochs = np.arange(0, 10)
plt.figure(figsize=(12, 8))
plt.plot(num_epochs, loss, label='train_loss', c='red')
plt.plot(num_epochs, acc, label='train_acc', c='green')
plt.title('Training Loss and Accuracy using CNN')
plt.xlabel('Epoch')
plt.ylabel('Loss/Accuracy')
plt.legend()
plt.savefig('plot2.png')

We have more than 90% accuracy during training, but let’s see the test accuracy now.

prediction_scores = model_cnn.evaluate(x_test, y_test)
print('Accuracy: ', prediction_scores[1]*100)
print('Loss: ', prediction_scores[0])

10000/10000 [==============================] - 1s 61us/step
Accuracy:  90.7
Loss:  0.3014533164203167

The test accuracy dopped by a huge margin. Maybe we need more training epochs or maybe a better model architecture to get better accuracy. You should surely play around some more trying to improve the accuracy. You can also post your findings in the comment section.

More Materials to Get Deeper

Deep Learning and Machine Learning Books, Papers and Articles:

1. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, by Aurélien Géron. (Chapters on Deep Learning and CNN)
2. LeNet, Yann LeCun
3. Convolutional Neural Network Architectures

Summary and Conclusion

In this article, you learned how to carry out image classification using different deep learning architectures. I hope that you liked this article. Subscribe to the website to get more timely articles.

You can also follow me on Twitter and LinkedIn to get notifications about future articles.

Liked it? Take a second to support Sovit Ranjan Rath on Patreon!

Introduction to Computer Vision with Deep Learning

Overview

The Dataset

Installing Keras and TensorFlow

Deep Learning and Neural Networks

Load the Libraries

Load and Prepare the Data

Visualize the Images

Image Classification using Dense Layers

Build the Model

Compile and Train the Model

Accuracy and Loss Plots

Image Classification using CNN

Build the CNN Model

Compile and Train the Model

Accuracy and Loss Plots

More Materials to Get Deeper

Summary and Conclusion

Leave a Reply Cancel reply