You have finally decided to embark on the journey to become a data scientist. Also, you have started watching tutorials and doing online courses. You are having lots of fun while solving problems on the online interactive environments. Along with that, you want to set up your own computer also, so that you can have as much fun without the online environments as well. Right? But you are getting overwhelmed by all the things you have to install and setup. Well, then this article is going to help you to set up your computer for data science and machine learning.
What Will This Guide Cover?
- Installing Python
- Installing Anaconda
- Setting up the virtual environment
- Setting up Jupyter Notebook
- Installing required Python packages
This article will cover all the above setups with Windows. So, if you are a Mac or Linux user, then I would highly recommend you to get help from other tutorials. Still, you can stick around this article. It is never harmful to learn how other systems work and you may be able to help some other Windows users with this.
Installing Python
Python has become the go-to programming language for Data Science and Machine Learning. Obviously, other languages like R are also widely used but the wide availability of python libraries is what makes it so special. So, the first step, Install Python. I would highly recommend getting the latest version. You can download the Windows Installer here. It will be more hassle-free if you check the ‘Add Python <version> to PATH’ while installing. That way you do not have to do it manually after installation.
Installing Anaconda
Anaconda is a data science distribution for Python and R. It is also a package manager and it will also help you to create your own environment for data science as you will see later in this post. Also, Anaconda is the recommended way to Install Jupyter Notebooks.
Click here to go to the official Anaconda website and download the installer.
Setting up a virtual environment
Anaconda comes with a lot of required data science packages pre-installed. But you will be creating a new virtual environment (virtual env) and installing all the required packages again on your own to help you get started.
What is a virtual environment?
Sometimes, you may need a specific version of some library. At some other time, you may need a specific version of Python. Also, you may not need all the libraries and modules at all times. So, why not build a separate directory and install only the libraries and modules that you need? Setting up a virtual environment does exactly that.
A virtual environment will have all the packages that you will require for a specific project and some other additional packages as well. So, let’s set one up. After opening your Anaconda Prompt, follow along:
(base)C:\Users> conda create -n envname python=3 anaconda
In the above code, envname is the name of the environment that you want to give and note that it will be created in the C drive, Users directory. Also, you can give whatever version of python you are comfortable with. If prompted, press y
to proceed.
Now to check whether or not the environment was created, type
conda env list
You should see list of the created environments. Now, to activate an environment simply type
conda activate envname
And to deactivate,
conda deactivate
Setting up Jupyter Notebook
Jupyter Notebook is the interactive environment where you will be writing all your code, creating files and doing visualizations as well. If you are going to use Jupyter Notebook for the first time, then I recommend that you visit the website to know some more about what all it can do.
Type jupyter notebook
in the command line to start Jupyter Notebook. If you get an error stating the jupyter notebook is not installed then type the following command to install it,
python3 -m pip install jupyter
Everything is set now. You can start jupyter notebook from the command line. Now, you just have to install the most common packages that you will be using on a daily basis for data science.
Installing required Python packages
To get started you have to install some python packages for data science which are most frequently required.
Type the following commands after activating your environment and type y
whenever prompted.
conda install numpy
conda install pandas
conda install matplotlib
conda install seaborn
conda install scipy
conda install scikit-learn
You may need more packages in the future as you move along. You will have to learn about them and install them as and when required. For now, the above packages should suffice and be sure to google about each of them as well.
If you find this post useful, don’t forget to comment, share and like. You can follow me on Twitter to get regular updates about my posts as well.
2 thoughts on “Data Science Environment Setup”