Starting a Machine Learning project can be really exciting and challenging as well. How? Suppose that, the company you are working for gets a project to design a predictive model for a huge organization. The model should be able to carry out predictive business analytics. And being the experienced engineer that you are, you get to lead the team and get started right away until you begin to face the challenges.
So, what are these challenges that commonly any machine learning engineer faces during a project? Want to know? Then, stick with me till the end.
Amount of Data
You are trying to make a machine learn, just like a human. So, it needs to be told what is what, again just like a human. But how many times does it take for a human being to learn and recognize a new car model? May be a few times. This is where the problem lies. Machines are not able to learn a new thing in just a few instances.
If you are trying to build a learning model for recognizing cars, still you will need thousands of data instances. Add a few more types of vehicles to the prediciton list and you may be looking at millions of instances.
So, this the first problem. Collecting enough data. Now, this is true that there is a lot of data being generated. But you need the relevant data for a particular learning model. This brings us to the second point.
Relevance or Quality of Data
You would not want to train the model with examples of animals while expecting the model to recognize cars or vehicles in general.
The above scenario is typical of most the machine learning projects. If the data you have collected is susceptible to a lot of noise and outliers, then the model will find it harder to find the learning patterns.
Then again
Extracting the relevant data takes almost up to 80% time of a machine learning project. (You can read this article to get some more ideas).
Feature Selection
You would not want your model to train on features that do not properly represent the data set. For this reason feature selection is very important.
The model should always get to train on the features that have the highest impact on future generalizations. Selecting good features for the model to train on is called feature engineering. The two most important steps in feature engineering are:
- Feature selection: This is where you select the most relevant and useful features that are already available in the data set.
- Feature extraction: In this step, you combine the available features and create even more useful features.
To the above-mentioned steps, one more thing can be added invariably, that is, gathering more data which is suitable for the project and make sure that the new data contains some new features as well that can be fed to the system for better predictions.
Overfitting and Underfitting
Overfitting is the situation when the model does really well on the training set but generalizes very poorly in the future predictions.
Underfitting occurs when the model gives way less accuracy than expected even for the training set.
These two are very broad definitions of overfitting and underfitting. A whole article can be written based on those two. But the above definitions capture the underlying meaning just as well.
That’s it for this post. Comment, share and like if you found this article valuable. Suggestions are always welcome. You can follow me on Twitter as well.
1 thought on “Challenges in a Machine Learning Project”