One Hot Encode in Machine Learning

Data preparation or processing is one of the most important steps when working with real-world data on a machine learning project.

One of the major pains in such situations is working with categorical data. This is because most of the machine learning algorithms cannot work with categorical data directly. They are needed to be converted to numerical data. One-Hot Encoding of the data is a very good solution to handle categorical data.

In this article, we will see what One-Hot Encoding is and where to use one-hot encoding.

Categorical Data

So, what is categorical data actually?

Simply speaking, in categorical data the values are labels instead of numbers.

Take the following case for example.

When you want to categorize ‘salary’ in a data set, then you may label it as ‘low’, medium’, ‘high’.

A table showing salary category — Table showing salary category

Similarly, if you want to label ‘shape‘ of objects, you could do something like, ’round’, ‘square’, ‘triangular‘.

The Problem with Categorical Data

You may be thinking that the above situation of categorizing values as labels seems fair enough and reasonable. Actually, you are right. But, when you bring the case of machine learning algorithms, then the situation changes.

Most of the machine learning algorithms out there cannot handle categorical labels in a data set directly. Whether it may be for classification or regression, the algorithms specifically need numerical data to carry out the predictions.

So, we need to convert the categorical labels into numerical labels. In the Machine Learning world, this is often termed as data transformation.

Now, let us see the different ways in which categorical labels can be handled.

Handling Categorical Labels

We will focus mainly on two methods here.

Label Encoding
One-Hot Encoding

1. Label Encoding

Label encoding is really simple thing. For each of the categorical label you assign an integer to it.

If we again consider the salary example, the you will be able to encode low as 1, medium as 2 and high as 3.

This process is okay until the number of labels is considerably small. When the number of labels increases, this solution may not work very well.

This brings us to the second technique, One-Hot Encoding.

2. One-Hot Encoding

In one-hot encoding, the numerical variables are replaced by binary variables.

So, each of the category is either 0 or 1. Again, take the shape category example into account. If the shape is, say, ‘triangle‘, then it is labeled as 1 and all other shapes are labeled as zero.

One-hot encoding is particularly used in those cases where there is no ordinal relationship between the labels.

The following image may clear some things up.

This technique really helps when the category labels are not related and even if there are numerous labels.

See these articles for more knowledge:

Conclusion

In this article, you learned about handling categorical labels in machine learning. I hope that you could get some knowledge out of it. If you have any thoughts, then comment in the comment section. Follow me on Twitter to get updates on articles.

Liked it? Take a second to support Sovit Ranjan Rath on Patreon!

3 thoughts on “One Hot Encode in Machine Learning”

Pingback: Action Recognition in Videos using Deep Learning and PyTorch
sana says:

January 31, 2021 at 5:47 pm

Everything went well until the training phase where we should have Cuda installed or it prompts to ask for no NVIDIA driver. Could you please add those installation steps as well or put in the blog that cuda needs to be installed before starting this project.

1. Sovit Ranjan Rath says:
  
  January 31, 2021 at 8:59 pm
  
  Hello sana. I am a bit confused because there is no training phase in this tutorial. Are you referring to some other tutorial and posted the comment here by mistake?

One Hot Encode in Machine Learning

Categorical Data

The Problem with Categorical Data

Handling Categorical Labels

1. Label Encoding

2. One-Hot Encoding

Conclusion

3 thoughts on “One Hot Encode in Machine Learning”

Leave a Reply Cancel reply