Workout Recognition using CNN and Deep Learning

Deep Learning and computer vision have immense potential in the field of exercise and workout analysis. It can recognize whether someone is doing an exercise wrongly and suggest changes according to the situation. But for this, the deep learning model first has to recognize a particular exercise. To tackle that, in this blog post, we will train a CNN based deep learning model for workout recognition.

Jump to Download Code

Figure 1. An example of workout recognition using CNN.

Convolutional neural networks are great at image analysis and recognition. With the host of pretrained models out there, we can build interesting applications which solve real-world problems. We this post, we will try to build a simple exercise recognizer using a CNN based deep learning model. At the end of this post, we will also discuss some ways to take this project even further.

Before jumping into the depth of this post, here are the important points that we will cover:

We will start with a discussion of the dataset. We will use a dataset consisting of several thousand images of different exercises.
Then we will move on to the preparation of the model. There are some important considerations that we will make while preparing the pretrained model. We will discuss this in depth.
Next, we will discuss the preparation of the dataset.
Then, we will move over to the training of the model.
After training, we will run inference experiments on unseen images and videos from the internet to check the robustness of our model.

The Workout Recognition Dataset

To train our workout recognition CNN model in this tutorial, we will use the Workout/Exercise Images from Kaggle.

This dataset contains almost 13900 images across 22 different exercises (classes). Workout images belonging to a particular class/workout remain in the folder with the same name. Each workout has at least 500 images and some go up to 900 images.

The following are the names of all the classes in this dataset.

barbell biceps curl
lat pulldown
pull up
leg extension
plank
lateral raises
deadlift
chest fly machine
push up
tricep dips
hip thrust
decline bench press
squat
shoulder press
t bar row
tricep pushdown
russian twist
romanian deadlift
bench press
hammer curl
leg raises
incline bench press

Here are some of the ground truth images from the dataset.

Figure 2. A few ground truth images from the workout recognition dataset.

Almost all of the images have been sourced from videos. So, they appear to be sequential images when going through the directory. For this reason, we will need to shuffle the dataset before training our model.

If you plan on training the model, you can go ahead and download the dataset. After extracting, you should see the following directories.

├── barbell biceps curl [705 entries exceeds filelimit, not opening dir]
├── bench press [625 entries exceeds filelimit, not opening dir]
├── chest fly machine [527 entries exceeds filelimit, not opening dir]
├── deadlift [530 entries exceeds filelimit, not opening dir]
├── decline bench press [514 entries exceeds filelimit, not opening dir]
├── hammer curl [546 entries exceeds filelimit, not opening dir]
├── hip thrust [557 entries exceeds filelimit, not opening dir]
├── incline bench press [729 entries exceeds filelimit, not opening dir]
├── lateral raises [843 entries exceeds filelimit, not opening dir]
├── lat pulldown [646 entries exceeds filelimit, not opening dir]
├── leg extension [586 entries exceeds filelimit, not opening dir]
├── leg raises [514 entries exceeds filelimit, not opening dir]
├── plank [993 entries exceeds filelimit, not opening dir]
├── pull up [615 entries exceeds filelimit, not opening dir]
├── push up [601 entries exceeds filelimit, not opening dir]
├── romanian deadlift [555 entries exceeds filelimit, not opening dir]
├── russian twist [522 entries exceeds filelimit, not opening dir]
├── shoulder press [512 entries exceeds filelimit, not opening dir]
├── squat [742 entries exceeds filelimit, not opening dir]
├── t bar row [668 entries exceeds filelimit, not opening dir]
├── tricep dips [698 entries exceeds filelimit, not opening dir]
└── tricep pushdown [625 entries exceeds filelimit, not opening dir]

In the next section, we will see how to structure the entire project.

Project Directory Structure

Let’s check out the entire project directory structure for workout recognition using CNN.

├── inference_data
│   ├── barbell_bicep_curl.jpg
│   ├── bench_press.jpg
│   ├── deadlift.jpg
│   ├── hammer_curl.jpg
│   ├── lat_pulldown.jpg
│   ├── plank.mp4
│   └── push_up.mp4
├── input
│   ├── barbell biceps curl
│   ├── bench press
│   ...
│   ├── tricep dips
│   └── tricep pushdown
├── outputs
│   ├── inference_results
│   ├── accuracy.png
│   ├── best_model.pth
│   ├── loss.png
│   └── model.pth
└── src
    ├── class_names.py
    ├── datasets.py
    ├── inference.py
    ├── inference_video.py
    ├── model.py
    ├── train.py
    └── utils.py

The inference_data directory contains a few images and videos that are not part of the dataset. We will use this to run inference experiments after training the model.
After downloading and extracting the dataset, we put it in the input directory.
Next, the outputs directory contains all the outputs from training and inference.
And the src directory contains all the source code files.

You will get access to the inference data, the source code files, and the trained weights when downloading the zip file that comes with this post. If you plan on training your own model as well, please download the dataset from Kaggle before moving further.

The PyTorch Version

We are using PyTorch 2.0.0 for this project. Please ensure to have PyTorch version >= 1.13.0 for smooth running.

Working Recognition using CNN and Deep Learning

From here on, we will have a more technical discussion. As all the source code is available via the downloaded file, we will only discuss the code of some of the most important sections only. These include the preparation of the model, the data processing and augmentation, and the command line arguments for the training script.

Download Code

Download the Source Code for this Tutorial

The CNN Model for Workout Recognition

We are going to use a pretrained ResNet50 model and fine-tune it on this dataset. Here is the code that goes into the model.py file.

from torchvision import models

import torch.nn as nn

def build_model(fine_tune=True, num_classes=10):
    model = models.resnet50(weights='DEFAULT')
    if fine_tune:
        print('[INFO]: Fine-tuning all layers...')
        for params in model.parameters():
            params.requires_grad = True
    if not fine_tune:
        print('[INFO]: Freezing hidden layers...')
        for params in model.parameters():
            params.requires_grad = False
    model.fc = nn.Linear(in_features=2048, out_features=num_classes, bias=True)
    return model

The build_model function accepts two parameters:

fine_tune: A boolean parameter that tells the code to retrain all the layers of the ResNet50 model if we pass True.
num_classes: A parameter accepting the number of classes in the dataset.

Apart from that, we modify the final linear layer of the model, i.e., model.fc. The out_channels should be equal to the number of classes in the dataset. We make that change on line 15 in the above code block.

The Dataset Preparation

Two of the most important parts of the dataset preparation are the transforms and the augmentations. To make the learning process more robust and prevent overfitting, we apply a few augmentations. The following code block shows a part of datasets.py that defines the training and validation transforms.

### Part of datasets.py ###

# Training transforms
def get_train_transform(image_size):
    train_transform = transforms.Compose([
        transforms.Resize((image_size, image_size)),
        transforms.CenterCrop(224),
        transforms.RandomHorizontalFlip(p=0.5),
        transforms.RandomRotation(35),
        transforms.ToTensor(),
        transforms.Normalize(
            mean=[0.485, 0.456, 0.406],
            std=[0.229, 0.224, 0.225]
        )
    ])
    return train_transform

# Validation transforms
def get_valid_transform(image_size):
    valid_transform = transforms.Compose([
        transforms.Resize((image_size, image_size)),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(
            mean=[0.485, 0.456, 0.406],
            std=[0.229, 0.224, 0.225]
        )
    ])
    return valid_transform

While applying the transforms, first we resize the images to 256×256 resolution. Then we apply a center crop to the images to make them 224×224 dimensions. This helps in achieving a bit higher accuracy. This may be because almost all the persons performing any exercise are in the center of the image. So, by removing the unwanted region from the image, the model just gets to focus on the exercise. Other than that, we also apply the ImageNet normalization values as we are using a pretrained model.

Also, we apply two more augmentations to the training set. They are horizontal flipping and random rotation. This further prevents overfitting while training the model.

Other Utility Functions

We also have a utils.py file that holds a lot of helper functions and classes. These include a function to save the plots, the final model, and also the class to save the best model.

Other than that, we have a class_names.py file which contains a list with all the class names. We will use this during inference to annotate the images/frames with the predicted class names.

Fine Tuning the ResNet50 CNN Model for Workout Recognition

The train.py is the driver script that will help us in training the model. It supports quite a few command line arguments for easier control of the fine-tuning process.

--epochs: The number of epochs we want to train the model for.
--learning-rate: This argument controls the initial learning rate of the optimizer.
--batch-size: This accepts an integer defining a batch size for the data loader.
--fine-tune: A boolean argument controlling whether to train all the layers of the model or not.
--save-name: The name of the model file which is to be saved to the disk.
--scheduler: A boolean argument controlling whether to apply learning rate scheduling or not.

All the training experiments were carried out on a machine with 10 GB RTX 3080 GPU, 10th generation i7 CPU, and 32 GB of RAM.

To start the training, you may open up the terminal in the src directory and execute the following command.

python train.py --epochs 10 --batch-size 64 --fine-tune --scheduler

Here, we are training the model for 10 epochs, with a batch size of 64, while fine-tuning all the layers and applying a learning rate scheduler. The learning rate scheduler applies after 7 epochs.

The following is the truncated output from the terminal.

[INFO]: Number of training images: 12468
[INFO]: Number of validation images: 1385
[INFO]: Classes: ['barbell biceps curl', 'bench press', 'chest fly machine', 'deadlift', 'decline bench press', 'hammer curl', 'hip thrust', 'incline bench press', 'lat pulldown', 'lateral raises', 'leg extension', 'leg raises', 'plank', 'pull up', 'push up', 'romanian deadlift', 'russian twist', 'shoulder press', 'squat', 't bar row', 'tricep dips', 'tricep pushdown']
Computation device: cuda
Learning rate: 0.001
Epochs to train for: 10

[INFO]: Fine-tuning all layers...
ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
.
.
.
  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (fc): Linear(in_features=2048, out_features=22, bias=True)
)
23,553,110 total parameters.
23,553,110 training parameters.
Adjusting learning rate of group 0 to 1.0000e-03.
[INFO]: Epoch 1 of 10
Training
100%|████████████████████| 195/195 [00:35<00:00,  5.54it/s]                                                                                                                                   
Validation
100%|████████████████████| 22/22 [00:02<00:00, 10.75it/s]                                                                                                                                     
Training loss: 0.461, training acc: 87.544
Validation loss: 0.274, validation acc: 91.986

Best validation loss: 0.27431226487864147

Saving best model for epoch: 1

Adjusting learning rate of group 0 to 1.0000e-03.
--------------------------------------------------
.
.
.
[INFO]: Epoch 10 of 10
Training
100%|████████████████████| 195/195 [00:37<00:00,  5.22it/s]                                                                                                                                   
Validation
100%|████████████████████| 22/22 [00:02<00:00, 10.24it/s]                                                                                                                                     
Training loss: 0.005, training acc: 99.856
Validation loss: 0.068, validation acc: 98.845

Best validation loss: 0.06834395979537723

Saving best model for epoch: 10

Adjusting learning rate of group 0 to 1.0000e-04.
--------------------------------------------------
TRAINING COMPLETE

The model reached the best validation accuracy of 98.84% on the last epoch.

Let’s take a look at the accuracy and loss graphs to get a better idea of the learning process.

Figure 3. Accuracy after training the ResNet50 CNN model on the workout recognition dataset.

Figure 4. Loss after training the ResNet50 CNN model on the workout recognition dataset.

From the above two figures, it is clear that both, the loss and the accuracy were deteriorating after 5 epochs. The learning rate scheduler again put both of them into an improving trend.

Inference on Images

We have a few images in the inference_data directory which the model has not seen before. Let’s run inference on these images and check how the model performs.

The code for inference on images resides in the inference.py script.

There are five images with .jpg extension in the inference_data directory. The inference script accepts the path to a directory and carries out inference on all images with a .jpg extension. Further, the names of the images indicate the workout. Namely, these are the images in the directory.

barbell_bicep_curl.jpg
bench_press.jpg
deadlift.jpg
hammer_curl.jpg
lat_pulldown.jpg

All the image and video inference experiments were carried out on a machine with 10 GB RTX 3080 GPU, 10th generation i7 CPU, and 32 GB of RAM.

We can execute the following command to start the image inference process.

python inference.py --input ../inference_data/

In the above command, we just use the --input command line argument to provide the path to the inference directory.

The model predicted the following workouts correctly.

Correct predictions made by the ResNet50 model. The model is able tor recognize the workout categories correctly in these images. — Figure 5. Correct predictions made by the ResNet50 model. The model is able to recognize the workout categories correctly in these images.

The following are the images whose workout class the model was not able to predict correctly.

Figure 6. Wrong predictions made by the ResNet50 CNN model for workout recognition.

In the case of the lat pulldown exercise, it is understandable that the model predicts it chest fly machine. If you take a look at both the images in the ground truth folders, sometimes, it may not be very clear for the deep learning model to distinguish between them. Still, it looks like the model is doing pretty well after training.

Inference on Videos

Now, we will run inference on videos to check the real-time performance and the robustness of the model with moving scenes.

The inference_video.py script holds the code for inference on videos for workout recognition using CNN.

The inference_data directory contains two videos that we will carry out inference on. While executing the script, we need to provide the path to the input video.

First, let’s try to run inference on a plank workout video.

python inference_video.py --input ../inference_data/plank.mp4

Clip 1. Clip 1. Workout prediction on a person performing plank. The model is able to correctly predict the classes in all frames.

The model is performing really well here. It is predicting the plank exercise correctly in all the frames.

Now, a second experiment with the push-up exercise which looks very similar to plank.

python inference_video.py --input ../inference_data/push_up.mp4

Clip 2. Workout recognition using the ResNet50 CNN model on a person performing push ups.

Interestingly, the model did not classify the exercise as plank even in a single frame. Our CNN model seems pretty robust at workout recognition.

Further Improvements

Till now, our CNN based workout recognition model is working extremely well. Obviously, we can train longer with more augmentations and learning rate scheduler to make it even better. But there are some other ways to improve the model too.

Using a person detector and then cropping the person to recognize the exercise he/she is performing.
Collecting more data from the internet and doing some kind of auto-annotation using the already trained model that we have. Then training for even more epochs on the larger dataset.

The above two steps are time-consuming but will surely lead to the improvement of the model.

Summary and Conclusion

In this tutorial, we trained a convolutional neural network for workout recognition. After training, we also carried inference on unseen data. Along with that, we also discussed some improvement points. I hope that this tutorial was helpful to you.

If you have any doubts, thoughts, or suggestions, please leave them in the comment section. I will surely address them.

You can contact me using the Contact section. You can also find me on LinkedIn, and Twitter.

Workout Recognition using CNN and Deep Learning

The Workout Recognition Dataset

Project Directory Structure

The PyTorch Version

Working Recognition using CNN and Deep Learning

Download Code

The CNN Model for Workout Recognition

The Dataset Preparation

Other Utility Functions

Fine Tuning the ResNet50 CNN Model for Workout Recognition

Inference on Images

Inference on Videos

Further Improvements

Summary and Conclusion

Further Reading

Leave a Reply Cancel reply