Simple Facial Keypoint Detection using TensorFlow and Keras


Simple Facial Keypoint Detection using TensorFlow and Keras

In computer vision, face-related applications cover a major area. With deep learning leading a lot of computer vision based applications, it is indirectly leading the face-related applications as well. Starting from face recognition, to facial keypoint detection, to even security and surveillance applications. All of these tend to use deep learning and computer vision in some way or the other. And for sure, convolutional neural networks are a leading architecture for all of these. As such, it is good to have some knowledge of applying deep learning and computer vision face-related applications. With that in mind, we will solve a simple problem in this tutorial. We will cover a very simple training pipeline for facial keypoint detection using TensorFlow and Keras.

Landmarks and keypoints connected on a face.
Figure 1. Landmarks and keypoints connected on a face (Source).

This post will cover very basic and simple training for facial keypoint detection. As we will be using TensorFlow and Keras as the frameworks of choice, our work will become even easier. Although this is the starting point, in future posts, we will cover more advanced training along the same line.

Let’s take a look at the points that we will cover in this post.

  • We will start off, with a small discussion about the need for facial keypoint detection.
  • Next, we will discuss the dataset that we will use in this tutorial.
  • Then, we will discuss the directory structure, and all the library/framework-related dependencies and versions.
  • Following that, we will move on to the coding and training part of the tutorial.
  • After training and saving the model, we will also carry predictions on the validation and test images.
  • In the end, we will discuss what we learned and what are some of the possible drawbacks of the approach that we follow in this tutorial.

Note…

If you are more of a PyTorch favoring deep learning practitioner, then I have a similar training pipeline on the same dataset with PyTorch here.

Importance of Facial Keypoint Detection

By now, we have established that it is better to have some knowledge of facial keypoint detection and face applications in general. But what are some of the real-life applications of facial keypoint detection and computer vision-aided face applications? Let’s try to lay out some points here.

  • Security and surveillance: This is perhaps one of the best use cases for face recognition and facial landmark detection. Deep learning and computer vision models deployed on edge devices can help identify threats. They try to recognize persons on security cameras. This can help prevent a number of crimes that otherwise may be unavoidable.
  • Filters on smartphone apps: This is one of the fun use cases of facial landmark detection. And one of the best known out there are the Snapchat filters which work very well even with faces that are constantly moving. All of this is possible because of a combination of face detection and facial keypoint detection.
  • Securing devices: Locking and unlocking devices is also aided by facial keypoint detection. One of the common ones that many of us may have used is unlocking our smartphones via face recognition. This is also a combination of face detection, face recognition, and facial keypoint or landmark detection.

The above cover only a few applications. But there are numerous others. In fact, there are people in the computer vision and deep learning industry who have sole expertise in the face detection, recognition, and face-related applications field.

It is pretty evident that it is always a combination of technologies that goes into the final applications. Here, in this tutorial, we will start with simple facial keypoint detection using TensorFlow and Keras.

The Facial Keypoint Dataset

In this tutorial, to start the simple facial keypoint detection using TensorFlow and Keras, we will use this dataset.

This Kaggle dataset is from the Facial Keypoints Detection competition. Here, we have to detect the location of keypoints on face images.

Interestingly, the dataset does not contain any raw image files. All the dataset is present in two CSV files. After downloading and extracting the dataset, you will get the training.csv and test.csv files. The training.csv file contains the data in the following format.

left_eye_center_x	left_eye_center_y	right_eye_center_x	right_eye_center_y ... Image
66.033564	        39.002274	         30.227008	        36.421678	       238 2236 237 ...

There are a total of 31 columns. The first 30 columns contain the keypoint coordinates for the faces. Now, take note that the columns are in order of x and y coordinate for each part of the face. For example, first left_eye_center_x (x-coordinate) and left_eye_center_y (y-coordinate) for the left eye. Then for the right eye, and so on. The final column is the Image column containing all the pixel values for a 96×96 resolution grayscale image. To use it properly, we will have to reshape these values later on. If you open the CSV file, you will have a pretty good idea about it.

After extracting the dataset, you may find two other CSV files as well, but we don’t need them. It is also worthwhile to keep in mind that the test.csv file only contains the image information, as that is to be used for the competition submission. We will test our trained model on images in this file later on.

For now, let’s take a look at some ground truth images that we can extract from the training.csv file.

Faces from the keypoint detection dataset.
Figure 2. Faces from the keypoint detection dataset.

As you can see, all the images are in grayscale format.

The following figure shows the same images but with the keypoints.

Faces from the keypoint detection dataset with ground truth keypoints plotted on them.
Figure 3. Faces from the keypoint detection dataset with ground truth keypoints plotted on them.

We can see the fifteen keypoints on the faces.

An Important Note About the Dataset

If you open the training.csv file, you will find that there are more than 7000 instances. But a lot of them are missing the information in some or all of the columns. And for obvious reasons, we cannot directly read the dataset from this CSV file and just feed it to our neural network. We need to clean this up. We will get to know more about this step while preparing the dataset in the coding section.

For now, please head over to the competition page and download the dataset. In the next section, we will see how to extract and structure the directory for the input files and other Python files as well.

Directory Structure

The following block shows the directory structure for all the files/folders.

├── input
│   ├── IdLookupTable.csv
│   ├── SampleSubmission.csv
│   ├── test.csv
│   └── training.csv
├── outputs
│   ├── saved_model
│   │   ├── assets
│   │   ├── variables
│   │   │   ├── variables.data-00000-of-00001
│   │   │   └── variables.index
│   │   ├── keras_metadata.pb
│   │   └── saved_model.pb
│   ├── test_results [1783 entries exceeds filelimit, not opening dir]
│   ├── validation_results [416 entries exceeds filelimit, not opening dir]
│   └── loss.png
├── src
│   ├── config.py
│   ├── dataset.py
│   ├── evaluate_and_test.py
│   ├── model.py
│   ├── train.py
│   └── utils.py

Let’s go over the important files/directories.

  • The input directory contains all the CSV files after extracting the dataset. Make sure to have your own structure like this as well.
  • The outputs directory will hold the saved TensorFlow model after training, and also the test and validation image results. It will also contain the loss graph that we obtain from training.
  • And the src directory contains all the Python files that we need for facial keypoint detection using TensorFlow and Keras. We will discuss these in the coding section.

Apart from the files in the input directory, you will get access to all others while downloading the zip file for this tutorial.

TensorFlow and Keras Versions

All the code in this tutorial has been developed using TensorFlow 2.7.0 and Keras 2.7.0. Be sure to have the same version or at least TensorFlow version 2.6.0. With the latest versions of TensorFlow 2.x, Keras will be automatically installed, so you don’t need to install it manually.

Simple Facial Keypoint Detection using TensorFlow and Keras

Let’s start with the coding part of this tutorial. We will cover each of the Python files in the src directory in the following order:

  • config.py
  • utils.py
  • dataset.py
  • model.py
  • train.py

The above scripts will take us till the end of training the TensorFlow deep learning model on the facial keypoints dataset. Then we will use the evaluate_and_test.py to carry out inference on the validation and test set.

The Configuration File

The configuration Python file contains some of the general configurations we need for training and testing. They are the root data path, the learning parameters, and the train/test split percentage.

The following code will go into the config.py file.

# Input root path.
ROOT_PATH = '../input'

# Learning parameters.
BATCH_SIZE = 32
LR = 0.001
EPOCHS = 300
IMGAE_RESIZE = 96

# Train/test split.
TEST_SPLIT = 0.2

# Show dataset keypoint plot (executing `dataset.py`).
SHOW_DATASET_PLOT = True

As we can see, we have set up all the required configurations here. We can use these throughout the training and inference pipeline very easily. The only odd one here is the SHOW_DATASET_PLOT. If this is True, and we execute dataset.py from the terminal, then it will show us a plot of the faces with the ground truth facial keypoints. We will get into more details about these while writing the dataset preparation code.

Helper Functions and Utilities

Even a simple deep learning training pipeline will require some utility scripts and helper functions. And it is always better to have them in a separate module from the beginning and not in the same executable training script. This will ensure we can import them whenever we need them without cluttering other parts of the code.

Here, we will write four helper functions. And all of them will go into the utils.py file.

Helper Function to Plot Ground Truth and Predicted Keypoints on the Validation Data

Starting with the imports and the first helper function.

import matplotlib.pyplot as plt
import numpy as np
import os

plt.style.use('ggplot')

def evaluation_keypoints_plot(
    image, outputs, orig_keypoints, save_path
):
    """
    This function plots the regressed (predicted) keypoints from all the 
    evalutaion images.
    """
    output_keypoint = outputs.reshape(-1, 2)
    orig_keypoint = orig_keypoints.reshape(-1, 2)
    image = image.reshape(96, 96)
    plt.style.use('default')
    plt.imshow(image, cmap='gray')
    plt.axis('off')
    for p in range(output_keypoint.shape[0]):
        plt.plot(output_keypoint[p, 0], output_keypoint[p, 1], 'r.')
        plt.text(output_keypoint[p, 0], output_keypoint[p, 1], f"{p}")
        plt.plot(orig_keypoint[p, 0], orig_keypoint[p, 1], 'g.')
        plt.text(orig_keypoint[p, 0], orig_keypoint[p, 1], f"{p}")

    plt.savefig(save_path)
    plt.close() 

We have the evaluation_keypoints_plot function in the above code block. It accepts the image, the outputs keypoints, the original keypoints (orig_keypoints), and the save_path string to save the image. This will plot the ground truth keypoints from the validation data and the predicted (regressed) keypoints on the current validation image. Note that the image we pass has dimension (HxWx1). So, we need to reshape it before we can plot it using Matplotlib.

After plotting the keypoints, we save them to disk for later analysis. Taking a look at the images will give us a good idea of how well the TensorFlow deep learning model has learned.

Helper Function to Plot Predicted Keypoints from the Test Data

After training, we will also predict the keypoints on the image data present in the test.csv file. Let’s write a helper function for that as well, that will be very similar to the above one, except for the ground truth data.

def test_keypoints_plot(image, outputs, save_path):
    """
    This function plots the keypoints for the outputs and images
    from the `test.csv` file.
    """
    output_keypoint = outputs.reshape(-1, 2)
    image = image.reshape(96, 96)
    plt.style.use('default')
    plt.imshow(image, cmap='gray')
    plt.axis('off')
    for p in range(output_keypoint.shape[0]):
        plt.plot(output_keypoint[p, 0], output_keypoint[p, 1], 'r.')
        plt.text(output_keypoint[p, 0], output_keypoint[p, 1], f"{p}")

    plt.savefig(save_path)
    plt.close()

The above function only plots the regressed keypoints for the test images as the test data does not contain any ground truth data.

Helper Function to Plot Image and Keypoints from Sequence Data

Further in the coding part, we will write our custom Sequence dataset class to create training and validation datasets. Let’s write a function so that we can visualize the images and ground truth keypoints from the datasets.

def dataset_keypoints_plot(data):
    """
    This function shows the image faces and keypoint plots that the model
    will actually see. This is a good way to validate that our dataset is in
    fact correct and the faces align with the keypoint features. The plot 
    will be show if you execute `dataset.py`.
    """
    plt.figure(figsize=(20, 40))
    for i in range(30):
        img = data[0][0][i]
        img = np.array(img, dtype='float32')
        img = img.reshape(96, 96)
        plt.subplot(5, 6, i+1)
        plt.imshow(img, cmap='gray')
        keypoints = data[0][1][i]
        keypoints = keypoints.reshape(-1, 2)
        for j in range(len(keypoints)):
            plt.plot(keypoints[j, 0], keypoints[j, 1], 'r.')
    plt.show()
    plt.close()

The dataset_keypoints_plot accepts a Sequence data object and plots 30 faces with their keypoints. After writing the code in dataset.py further on, if we execute that file from the terminal, then the image will be plotted.

Helper Function to Plot the Loss

Now, we just have one more helper function. It is a simple one for plotting and saving the loss graphs to disk.

def save_plots(history):
    """
    Function to save the loss and accuracy plots to disk.
    """
    train_loss = history.history['loss']
    valid_loss = history.history['val_loss']
    # Loss plots.
    plt.figure(figsize=(12, 9))
    plt.plot(
        train_loss, color='orange', linestyle='-', 
        label='train loss'
    )
    plt.plot(
        valid_loss, color='red', linestyle='-', 
        label='validataion loss'
    )
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    plt.savefig(os.path.join(
        '..', 'outputs', 'loss.png'
    ))
    plt.show()

The save_plots accepts the training history object and saves the loss graph to disk.

Preparing the Dataset

Now, it’s time to complete one of the most important parts of the tutorial. Writing the code to prepare the dataset for facial keypoint detection using TensorFlow and Keras.

We will write the dataset preparation code in the dataset.py file.

First, let’s import all the modules and libraries.

import cv2
import pandas as pd
import numpy as np
import config
import utils

from tensorflow.keras.utils import Sequence
from tqdm import tqdm

resize = config.IMGAE_RESIZE

Along with all the required libraries, we are also importing our own config and utils modules. And you can see that we are defining the resize variable according to the configuration file. We will use the Sequence class to prepare our custom dataset loader here, as that will be much easier to manage given the format in which we have the data.

Cleaning and Splitting the Data

As we discussed earlier, there are more than 7000 rows of data available in the training CSV file. But a lot of them have missing values. We have to clean the data so that there are no errors when extracting the values. In this situation, there are two options. We either take the average of all the respective keypoint columns and fill them up in the missing columns. Or drop all the missing data rows altogether.

The first approach is a bit error-prone as there is no guarantee the average values of keypoints will satisfy well enough for all the images which are missing the values. There is a very high chance that the model may learn from the wrong data. For that reason, we will just drop those rows. Surely, we will have much fewer instances, but at least, all of them will be correct.

The following train_test_split function does two things:

  • Cleans up the data by dropping all the rows with missing values.
  • Splits the data into a train and validation set.

def train_test_split(csv_path, split):
    df_data = pd.read_csv(csv_path)
    # Drop all the rows with missing values.
    df_data = df_data.dropna()
    len_data = len(df_data)
    # Calculate the validation data sample length.
    valid_split = int(len_data * split)
    # Calculate the training data samples length.
    train_split = int(len_data - valid_split)
    training_samples = df_data.iloc[:train_split][:]
    valid_samples = df_data.iloc[-valid_split:][:]
    print(f"Training sample instances: {len(training_samples)}")
    print(f"Validation sample instances: {len(valid_samples)}")
    return training_samples, valid_samples

The above function accepts the CSV file path and the split ratio as parameters. After cleaning and splitting the data, it returns the training_samples and valid_samples data frames.

The Sequence Data Class

The next thing important thing here is to write a custom dataset using the Sequence class. We will not go into the details of the working of Sequence class here.

Let’s write the FaceKeypointDataset dataset here.

class FaceKeypointDataset(Sequence):
    def __init__(self, samples, batch_size):
        self.batch_size = batch_size
        self.data = samples
        # Get the image pixel column only.
        self.pixel_col = self.data.Image
        self.image_pixels = []
        for i in tqdm(range(len(self.data))):
            img = self.pixel_col.iloc[i].split(' ')
            self.image_pixels.append(img)

        self.images = np.array(self.image_pixels, dtype='float32')

    def __len__(self):
        return len(self.data) // self.batch_size
    
    def __getitem__(self, index):
        batch_images = self.images[index*self.batch_size:(index+1)*self.batch_size]
        batch_keypoints = self.data.iloc[index*self.batch_size:(index+1)*self.batch_size]
        final_images = []
        final_keypoints = []
        for j in range(self.batch_size):
            # Reshape the images into their original 96x96 dimensions.
            input_image = batch_images[j]
            image = input_image.reshape(96, 96)
            orig_w, orig_h = image.shape
            # Resize the image into `resize` defined above.
            image = cv2.resize(image, (resize, resize))
            # Again reshape to add grayscale channel format.
            image = image.reshape(resize, resize, 1)
            image = image / 255.0
            # Get the keypoints.
            keypoints = batch_keypoints.iloc[j][:30]
            keypoints = np.array(keypoints, dtype='float32')
            # Reshape the keypoints.
            keypoints = keypoints.reshape(-1, 2)
            # Rescale keypoints according to image resize.
            keypoints = keypoints * [resize / orig_w, resize / orig_h]
            keypoints = np.ravel(keypoints)
            final_images.append(image)
            final_keypoints.append(keypoints)
        final_images = np.array(final_images)
        final_keypoints = np.array(final_keypoints)
        return (final_images, final_keypoints)

It is mandatory to implement the __len__ and __getitem__ methods. Also, we need to ensure that the __getitem__ method returns a complete batch of data as per the batch_size passed on to the __init__ method.

The __init__ method above defines the batch_size, the data (either training or validation data frame), and stores all the image pixels in the images NumPy array.

As for the __getitem__ method, we return a batch of data in the form of final_images and final_keypoints NumPy arrays. Also, observe that we are resizing the images and adjusting the keypoints according to the resizing as per the resize parameter. This will ensure that we can freely choose any resizing factor in the configuration file.

Function to Return the Sequence Data

Now, we just need to finish up the final part of dataset preparation. We will write two simple functions. One to prepare the training and validation data frames, and the other to return the training and validation Sequence datasets.

# Get the training and validation data samples.
training_samples, valid_samples = train_test_split(
    f"{config.ROOT_PATH}/training.csv",
    config.TEST_SPLIT
)

def get_data():
    train_ds = FaceKeypointDataset(training_samples, batch_size=config.BATCH_SIZE)
    valid_ds = FaceKeypointDataset(valid_samples, batch_size=config.BATCH_SIZE)
    return train_ds, valid_ds

if __name__ == '__main__':
    # Show to show dataset keypoint plots if enabled in `config.py`.`
    if config.SHOW_DATASET_PLOT:
        _, valid_ds = get_data()
        utils.dataset_keypoints_plot(valid_ds)

The get_data function returns the training and validation datasets. Along with that, if we execute dataset.py itself, it will call the dataset_keypoints function from utils giving us the following output.

Images from the sequence data that will be fed into the model.
Figure 4. Images from the sequence data that will be fed into the model.

This completes our dataset preparation part.

The Neural Network Model

As you might have guessed by now, our neural network model is going to regress 30 keypoint values. Each of them corresponds to one set of coordinates from the 15 keypoint coordinate pairs from the faces. As such, it will have 30 units in the final Dense layer

Let’s write the model preparation code in model.py.

import tensorflow as tf

from tensorflow.keras import layers

def build_model(image_size):
    inputs = layers.Input(shape=(image_size[0], image_size[1], 1))
    # Conv => Activation => Pool blocks.
    x = layers.Conv2D(32, kernel_size=(5, 5))(inputs)
    x = layers.ReLU()(x)
    x = layers.MaxPool2D((2, 2))(x)

    x = layers.Conv2D(64, kernel_size=(3, 3))(x)
    x = layers.ReLU()(x)
    x = layers.MaxPool2D((2, 2))(x)

    x = layers.Conv2D(128, kernel_size=(3, 3))(x)
    x = layers.ReLU()(x)
    x = layers.MaxPool2D((2, 2))(x)

    x = layers.Conv2D(256, kernel_size=(3, 3))(x)
    x = layers.ReLU()(x)
    x = layers.MaxPool2D((2, 2))(x)
    
    x = layers.Conv2D(512, kernel_size=(3, 3))(x)
    x = layers.ReLU()(x)
    x = layers.MaxPool2D((2, 2))(x)
    
    # Linear layers.
    x = layers.GlobalAveragePooling2D()(x)
    x = layers.Dense(units=256)(x)
    x = layers.ReLU()(x)
    x = layers.Dense(units=30)(x)

    model = tf.keras.Model(inputs, outputs=x)
    return model

As you can see, instead of hardcoding the input image size, we are passing that as a parameter to the build_model function. This will ensure that the model code remains consistent with the image resizing in the confg.py file.

Apart from that, the model itself is pretty simple. First, we have a stacking of 2D convolutional layers, ReLU activation, and the 2D max-pooling layers.

The linear part of the network consists of two Dense layers, out of which the final one is the classification head with 30 units.

The Training Script

The training script is perhaps the simplest part of the entire pipeline. As we have defined almost all the components earlier, we just need to connect each of them and start the training.

We will write the training script code in the train.py file.

The following block contains the code for the entire training script.

from model import build_model
from dataset import get_data
from utils import save_plots

import config
import tensorflow as tf

# Model checkpoint callback.
model_ckpt = tf.keras.callbacks.ModelCheckpoint(
    filepath='../outputs/saved_model',
    monitor='val_loss',
    mode='auto',
    save_best_only=True
)

# Load the training and validation data.
train_ds, valid_ds = get_data()

# Build and compile the model.
model = build_model((config.IMGAE_RESIZE, config.IMGAE_RESIZE))
print(model.summary())
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=config.LR), 
    loss=tf.keras.losses.MeanSquaredError(), 
)

# Train the model.
history = model.fit(
    train_ds,
    validation_data=valid_ds,
    epochs=config.EPOCHS,
    callbacks=[model_ckpt],
    workers=4, 
    use_multiprocessing=True
)

save_plots(history)

We import all the custom modules and tensorflow as well.

First, on line 9, we create a model checkpoint callback to save the best model according to the loss value.

Then on line 17, we load the training and validation datasets.

Next, we initialize the model and compile it. Observe that we are using only the MeanSquaredError loss here and no other metric. As this is a regression problem, it does not make much sense to have an accuracy metric here. Only a proper loss function should suffice.

Finally, we train the model with the callbacks and save the loss plots to disk.

Note: If you are on Windows OS, consider removing/commenting out the workers and use_multiprocessing arguments in the model.fit() method. Using these two arguments generally tends to freeze the training on Windows OS after the model initialization.

Executing train.py and Analyzing the Results

To run the training script, simply execute the following command in the terminal/command line within the src directory.

Note: We train here for 300 epochs. Even with moderately powerful hardware, it should not take more than 15-20 minutes to train.

python train.py 

The following block shows the truncated output from the terminal.

Training sample instances: 1712
Validation sample instances: 428
100%|███████████████████████████████████████████████████████████████████████████████████████████| 1712/1712 [00:00<00:00, 3709.54it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████| 428/428 [00:00<00:00, 4553.79it/s]
Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 96, 96, 1)]       0         
                                                                 
 conv2d (Conv2D)             (None, 92, 92, 32)        832       
                                                                 
 re_lu (ReLU)                (None, 92, 92, 32)        0         
                                                                 
 max_pooling2d (MaxPooling2D  (None, 46, 46, 32)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 44, 44, 64)        18496     
                                                                 
 re_lu_1 (ReLU)              (None, 44, 44, 64)        0         
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 22, 22, 64)       0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 20, 20, 128)       73856     
                                                                 
 re_lu_2 (ReLU)              (None, 20, 20, 128)       0         
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 10, 10, 128)      0         
 2D)                                                             
                                                                 
 conv2d_3 (Conv2D)           (None, 8, 8, 256)         295168    
                                                                 
 re_lu_3 (ReLU)              (None, 8, 8, 256)         0         
                                                                 
 max_pooling2d_3 (MaxPooling  (None, 4, 4, 256)        0         
 2D)                                                             
                                                                 
 conv2d_4 (Conv2D)           (None, 2, 2, 512)         1180160   
                                                                 
 re_lu_4 (ReLU)              (None, 2, 2, 512)         0         
                                                                 
 max_pooling2d_4 (MaxPooling  (None, 1, 1, 512)        0         
 2D)                                                             
                                                                 
 global_average_pooling2d (G  (None, 512)              0         
 lobalAveragePooling2D)                                          
                                                                 
 dense (Dense)               (None, 256)               131328    
                                                                 
 re_lu_5 (ReLU)              (None, 256)               0         
                                                                 
 dense_1 (Dense)             (None, 30)                7710      
                                                                 
=================================================================
Total params: 1,707,550
Trainable params: 1,707,550
Non-trainable params: 0
_________________________________________________________________
Epoch 1/300
53/53 [==============================] - 4s 34ms/step - loss: 588.1769 - val_loss: 87.8794
Epoch 2/300
53/53 [==============================] - 2s 29ms/step - loss: 61.1080 - val_loss: 40.1440
Epoch 3/300
53/53 [==============================] - 2s 32ms/step - loss: 21.1378 - val_loss: 28.5753
Epoch 4/300
53/53 [==============================] - 2s 31ms/step - loss: 24.3583 - val_loss: 27.4814
Epoch 5/300
53/53 [==============================] - 2s 32ms/step - loss: 14.5680 - val_loss: 22.9587
...
Epoch 299/300
53/53 [==============================] - 1s 14ms/step - loss: 0.7052 - val_loss: 9.1143
Epoch 300/300
53/53 [==============================] - 1s 15ms/step - loss: 1.3051 - val_loss: 7.9403

By the end of the training, the training loss is hovering between 1.3 and 0.5 and the validation loss is somewhere around 7.9. Not very bad results considering such a simple model and simple training pipeline.

Loss after training the facial keypoint detection model using TensorFlow and Keras.
Figure 5. Loss after training the facial keypoint detection model using TensorFlow and Keras.

The loss plots do not show any sign of overfitting. Most probably, training for a few more epochs may give even better results.

Inference for Facial Keypoint Detection using TensorFlow and Keras using the Saved Model

After the training is complete, we have the trained model inside outputs/saved_model. We will use this trained model for two things:

  • For predicting the keypoints in the validation set and checking out how well the predicted keypoints match the ground truth ones.
  • For predicting the keypoints on the images in the test.csv file.

To accomplish the above two, we need to write a simple script.

The code for this will go into the evaluate_and_test.py file.

Let’s go over the code briefly. As usual, starting with the import statements. Along with that, let’s create the appropriate output directories to save the results, load the saved model, and the validation data as well.

"""
Script to evaluate the model on the validation dataset and
test it on the test dataset. Also, plot all the results and save them to disk.
"""

import tensorflow as tf
import os
import pandas as pd
import numpy as np

from dataset import get_data
from utils import evaluation_keypoints_plot, test_keypoints_plot
from tqdm import tqdm

# Create directory to save validation results.
validation_result_path = os.path.join('..', 'outputs', 'validation_results')
os.makedirs(os.path.join(validation_result_path), exist_ok=True)

# Create directory to save test results.
test_result_path = os.path.join('..', 'outputs', 'test_results')
os.makedirs(os.path.join(test_result_path), exist_ok=True)

model = tf.keras.models.load_model('../outputs/saved_model')
print(model.summary())

_, valid_ds = get_data()

The validation data images results will be saved in the validation_results directory and the test data images will be saved in the test_results directory.

The Evaluation and Test Functions

Next, we will write two functions. One for the validation dataset and one for predicting on the test.csv data.

def evaluate(valid_ds):
    # Get the results.
    results = model.predict(valid_ds)
    # Loop over the validation set and save the 
    # images and corresponding result plot to disk.
    counter = 0
    for i, batch in tqdm(enumerate(valid_ds), total=len(valid_ds)):
        for j, (image, keypoints) in enumerate(zip(batch[0], batch[1])):
            evaluation_keypoints_plot(
                image, results[counter], keypoints,
                save_path=os.path.join(validation_result_path, str(counter)+'.png')
            )
            counter += 1

def test(test_csv_path):
    """
    Function to predict on all images present in `test.csv` file
    """
    test_df = pd.read_csv(test_csv_path)
    images = test_df.Image
    for i in tqdm(range(len(images)), total=len(images)):
        image = images.iloc[i].split(' ')
        image = np.array(image, dtype=np.float32) / 255.
        image = image.reshape(96, 96)
        image = image.reshape(96, 96, 1)
        image_batch = np.expand_dims(image, axis=0)
        image_tensor = tf.convert_to_tensor(image_batch)
        outputs = model.predict(image_tensor)
        test_keypoints_plot(
            image, outputs, 
            save_path=os.path.join(test_result_path, str(i)+'.png')
        )

print('Evaluating...')
evaluate(valid_ds)

print('Testing...')
test(test_csv_path=os.path.join('..', 'input', 'test.csv'))

The evaluate function predicts on the validation dataset, loops over all the images, and saves the image with the ground truth and predicted keypoints plotted on it.

Similarly, the test function samples out the images from the test.csv file, does the necessary preprocessing and predicts on those images. Then we save the resulting images to disks after plotting the keypoints on them.

In the end, we call both functions to carry out the inference.

Executing evaluate_and_test.py for Inference

Execute the following command within the src directory.

python evaluate_and_test.py 

The model may take a few minutes to run though all the images. You should see output similar to the following.

Evaluating...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:48<00:00,  3.72s/it]
Testing...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1783/1783 [04:44<00:00,  6.26it/s]

The following are some of the good results from inference on the validation data.

A few good validation predictions for keypoint detection using TensorFlow and Keras.
Figure 6. A few good validation predictions for keypoint detection using TensorFlow and Keras.

The above are some of the predictions which match closely with the ground truth data.

Now, a few bad predictions on the validation data.

Bad validation predictions for keypoint detection using TensorFlow and Keras.
Figure 7. Bad validation predictions for keypoint detection using TensorFlow and Keras.

The model is predicting the keypoints around the eyes badly. In some cases, the mouth keypoints are not so good as well.

The following figure shows a few of the keypoints prediction results from the test data.

Keypoint prediction results on the test data.
Figure 8. Keypoint prediction results on the test data.

Although we do not have the ground truth data for the test images, still most of the results look pretty good here.

A Few Takeaways, Advantages, and Disadvantages

This completes our implementation of simple facial keypoint detection using TensorFlow and Keras in this tutorial.

One of the advantages of the current approach is that our model is very simple with less than 2 million parameters. This means that it will be pretty fast even during video inference.

But there are a few disadvantages as well. For example, our model has been trained on grayscale images and will not work on RGB images. To carry out image and video inference on RGB images, we will need to train another model on RGB images. Or convert the RGB images to grayscale which is not very convenient.

For further experiments, we can try out the following things:

  • We can try building a larger model and see how it performs.
  • Maybe even train on larger images by resizing them.
  • We can also try and augment the images. But we need to be careful, as we will have to take care of the keypoint transforms in that case also.

If you try out any of the above, do let others know about your results in the comment section.

Summary and Conclusion

We covered a very simple training pipeline for facial keypoint detection using TensorFlow and Keras here. We analyzed the training and inference results and discussed some of the key advantages and disadvantages. I hope that this tutorial was helpful to you.

If you have any doubts, thoughts, or suggestions, please leave them in the comment section. I will surely address them.

You can contact me using the Contact section. You can also find me on LinkedIn, and Twitter.

Liked it? Take a second to support Sovit Ranjan Rath on Patreon!
Become a patron at Patreon!

1 thought on “Simple Facial Keypoint Detection using TensorFlow and Keras”

Leave a Reply

Your email address will not be published. Required fields are marked *