Deep Learning: An Introduction to Convolutional Neural Networks

Convolutional Neural Network (CNN for short) is perhaps the most widely used deep learning model when it comes to computer vision application. In this article, you will get to know about the basic working principle of convolutional neural networks.

The Building Block of a CNN

Convolutional Neural Networks are pretty widely used nowadays for image-classification task. You can find its usage in real-world applications as well in research and academia alike.

In CNN, there are two building blocks:
1. The convolutional layer
2. The pooling layer

Let’s go through each of them briefly.

Convolutional Layer

The convolution layer is the most important building block of any CNN. If we consider the image as the input, then the next layer is a convolutional layer to extract some specific features out of that image.

The convolutional layer contains neurons to extract the features from the input image. But the neurons are only connected to the pixels which are in their Local Receptive Field. Now, what is a local receptive field?

A local receptive field is a small area upon which some of the neurons focus. When all the neurons begin to focus on different areas, then the receptive fields begin to overlap. This leads to visualizing the complete image. The following image will give you a much better idea.

Image of Convolutional Layer — Convolutional Layer and Receptive Fields.

Eventually, the whole image is complete when all the neurons connect to each of the pixels in their own receptive fields. This principle is very similar to the working of visual cortex in humans. Moreover, as the neurons focus only on a small part of the image, they are able to extract the most important aspects of an image. This is one of the reasons why CNNs work so well.

Feature Map

You must have noticed the term feature map in the above image. Feature maps contain the feature (namely, height, width and depth of the image) upon which the convolution operations are carried out. For the depth or you can also call it as the channels axis, the input is 3 if it is a colored image. One for each color, red, green, blue (RGB). If the image is black and white, then the value is 1 as the colors are all levels of gray.

Pooling Layer

A convolutional neural network is a stacking of convolutional layers and pooling layers. The work of the pooling layer is to downsample the feature map.

In a convolutional layer, each of the neurons is connected to a limited number of neurons in the previous layer. The same happens in a pooling layer as well. The pooling layer downsamples the feature map by focusing on the feature map in small parts. Let’s look at an image to get a clearer picture.

The pooling layer takes a pooling window size. It is 2x2 for the above image, as you can see that each of the patch is covering 2 rows and 2 columns.

Stacking Up

The Convolutional Neural Network is usually a stack of convolutional layers, pooling layer and finally followed by a fully connected layer. We can visualize it somewhat like the following image.

Image for stacking a CNN — Stacking a CNN

Conclusion

This article covers the very basic of a CNN. There are many things which I did not cover as I wanted this article to be totally introductory. A future article will surely cover the concepts of padding, strides, CNN architectures and implementing a CNN in python.

If you gained any knowledge, then like and share the article. You can follow me Twitter and Facebook to get future updates. Sign up for the newsletter to get more content. You can follow me on LinkedIn as well.

Liked it? Take a second to support Sovit Ranjan Rath on Patreon!