Sparse Autoencoders using L1 Regularization with PyTorch


Sparse Autoencoders using L1 Regularization with PyTorch

Autoencoder deep neural networks are an unsupervised learning technique. Autoencoders are really good at mapping the input to the output. The output is mostly a compressed representation of the input data. And in the process of copying the input data to the output data, they learn many features about the data. In this article, we will learn about sparse autoencoders. Specifically, we will cover the following about sparse autoencoders in this article:

  • A brief about deep autoencoder.
    • Problems with deep autoencoders.
  • What are sparse autoencoders?
  • Why are sparse autoencoders important?
  • How to code a sparse autoencoder using PyTorch deep learning library?

In a series of previous articles, I have described the working of autoencoders, deep autoencoders, convolutional autoencoders, and denoising autoencoders. The following are those articles:

You can refer to the above articles if you are starting out with autoencoder neural networks. I hope that you will learn a lot, and I will love to know your thoughts in the comment section.

A Brief About Autoencoders

Basically, autoencoding is a data compressing technique. An autoencoder has two parts, the encoder, and the decoder. The autoencoder neural network learns to recreate a compressed representation of the input data.

The encoder encodes the data, and the decoder decodes the data. For example, let the input data be \(x\). Then, we can define the encoding function as \(f(x)\). A hidden layer \(h\) learns the encoding and we get, \(h = f(x)\). Finally, a decoding function \(g\) reconstructs the input as \(r = g(h)\). You can learn more about the basics of autoencoders from this article.

Problems with Deep Autoencoder

We know that autoencoders in general, try to map the input data to the output data.

Take the case of a deep fully connected neural network autoencoder. Suppose that, you are trying to map the very popular FashionMNIST images. And you give the number of input neurons the same as the number of pixels, that is 784. In such cases, there is a very high probability that the autoencoder neural network autoencoder will just copy the input as it is after a few epochs of training. It will not able to learn any important features of the data. The following image will give you a better idea.

FashionMNIST reconstructed image without any sparsity

FashionMNIST reconstructed image by autoencoder without any sparsity.

The above image shows FashionMNIST reconstructed image by autoencoder without adding any sparsity. We can see that the model is reaching a point where it will be remember everything about the images and just copy them. Maybe a few more iterations and it will be able to do that. We want to avoid this.

Why are Sparse Autoencoders Important?

Now moving on to sparse autoencoders. In sparse autoencoders, we can still use fully connected neurons with numbers equal to the image dimensionality. But still, by adding a sparsity regularization, we will be able to stop the neural network from copying the input.

Mainly, there are two ways to add sparsity constraints to deep autoencoders.

  • L1 regularization, which we will use in this article.
  • KL divergence, which we will address in the next article.

How to Use L1 Regularization for Sparsity

We will add the L1 sparsity constraint to the activations of the neuron after the ReLU function. This will make some of the weights to be zero which will add a sparsity effect to the weights.

The following formula will make things clearer.

$$
L1 = \lambda * \sum|w_{i}|
$$

Here, \(\lambda\) is the regularization parameter, and \(w_{i}\)s are the activation weights. We will add this regularization to the loss function, say MSELoss. So, the final cost will become,

$$
Cost = MSELoss + \lambda * \sum|w_{i}|
$$

We will implement all of this through coding, and then, things will become even clearer.

Sparse Autoencoders Neural Network using PyTorch

We will use the FashionMNIST dataset for this article. Along with that, PyTorch deep learning library will help us control many of the underlying factors. We can experiment our way through this with ease.

Before moving further, I would like to bring to the attention of the readers this GitHub repository by tmac1997. It has an implementation of the L1 regularization with autoencoders in PyTorch. We will be using some of the code. Although we will need to change the code for our particular use case.

Let’s start with the coding.

Directory Structure

For this tutorial, we will use the following directory structure.

├───input
├───outputs
│   └───images
└───src
        sparse_ae_l1.py
  • input folder will contain the FashionMNIST images that we will download using the torchvision datasets module.
  • outputs will contain the trained model that we will save and the loss plot as well. The subdirectory images will contain the images that the autoencoder will reconstruct on the validation dataset.
  • src contains the python file sparse_ae_l1.py, that will contain all of the python code that we will write.

Importing Modules

import torch
import torchvision
import torch.nn as nn
import matplotlib
import matplotlib.pyplot as plt
import torchvision.transforms as transforms
import torch.nn.functional as F
import torch.optim as optim
import os
import time
import numpy as np
import argparse

from tqdm import tqdm
from torchvision import datasets
from torch.utils.data import DataLoader
from torchvision.utils import save_image

matplotlib.style.use('ggplot')

Some of the important modules that we are using are:

  • torchvision to get the FashionMNIST dataset, apply transforms, and save the torch tensors easily as images.
  • torch.nn for accessing the neural network layers and activations in PyTorch.
  • DataLoader to prepare the iterable data loader to feed into the neural network model.
  • argparse to construct the argument parser.

Constructing the Argument Parsers

We will now construct the argument parser. Using command line arguments while executing the python file will make it easier for us to control some important parameters.

# constructing argument parsers 
ap = argparse.ArgumentParser()
ap.add_argument('-e', '--epochs', type=int, default=10,
	help='number of epochs to train our network for')
ap.add_argument('-l', '--reg_param', type=float, default=0.001, 
    help='regularization parameter `lambda`')
ap.add_argument('-sc', '--add_sparse', type=str, default='yes', 
    help='whether to add sparsity contraint or not')
args = vars(ap.parse_args())

epochs = args['epochs']
reg_param = args['reg_param']
add_sparsity = args['add_sparse']
learning_rate = 1e-3
batch_size = 32

print(f"Add sparsity regularization: {add_sparsity}")
  • --epochs defines the number of epochs that we will train our autoencoder neural network for.
  • --reg_param is the regularization parameter lambda.
  • --add_sparse is a string, either ‘yes’ or ‘no’. It tells whether we want to add the L1 regularization constraint or not. And obviously, it will be ‘yes’ in this tutorial.
  • On lines 11, 12, and 13 we are initializing the arguments as well, so that they will be easier to use further along.
  • On lines 14 and 15 we are specifying the learning rate and batch size.

Prepare the Data

We will download the dataset using the torchvision dataset module. We will also apply transforms to the image data. For the transforms, we will only convert the data into torch tensors.

# image transformations
transform = transforms.Compose([
    transforms.ToTensor(),
])

trainset = datasets.FashionMNIST(
    root='../input/data',
    train=True, 
    download=True,
    transform=transform
)
testset = datasets.FashionMNIST(
    root='../input/data',
    train=False,
    download=True,
    transform=transform
)
 
# trainloader
trainloader = DataLoader(
    trainset, 
    batch_size=batch_size,
    shuffle=True
)
#testloader
testloader = DataLoader(
    testset, 
    batch_size=batch_size, 
    shuffle=False
)
  • First at line 2, we define the transforms.
  • Then from lines 6 to 17, we download the training and test dataset that will be stored in input/data folder. If you already have the dataset, then it will not be downloaded.
  • From lines 19 to 30, we prepare the iterable train loader and test loader using PyTorch DataLoader. The batch size is 32 and we are only shuffling the train loader.

Some Helper Functions

In this section, we will define some helper functions that will make our work a little bit easier and automated. So, we will define three functions, namely, get_device(), make_dir(), and save_decoded_image(). Let’s write the code first, then we will get into the explanation part.

# get the computation device
def get_device():
    if torch.cuda.is_available():
        device = 'cuda:0'
    else:
        device = 'cpu'
    return device
device = get_device()

# make the `images` directory
def make_dir():
    image_dir = '../outputs/images'
    if not os.path.exists(image_dir):
        os.makedirs(image_dir)
make_dir()

# for saving the reconstructed images
def save_decoded_image(img, name):
    img = img.view(img.size(0), 1, 28, 28)
    save_image(img, name)
  • Starting from line 2, we define the get_device() function. This will grab either the CUDA GPU device or the CPU for computational purposes during the training of the autoencoder neural network model. Although it is not necessary to have a GPU for working with the Fashion MNIST dataset, still it is always better to have one for neural network training purposes.
  • Next from line 11, we define make_dir(). This function creates an images folder inside the outputs folder. Here, we will save all the images that are reconstructed by the autoencoder during validation.
  • Finally, from lines 18 to 20 we define save_decoded_image(). This function will save all the torch tensor outputs as images by resizing them into 28x28x1 dimensionality. It takes two input parameters, the image tensors, and the name of the file.

This finishes all of the preliminary coding parts. Now, we can get into the neural network coding and the core of this article. That is, training an autoencoder neural network with the sparsity penalty.

Define the Sparse Autoencoder Neural Network

In this section, we will define our sparse autoencoder neural network module.

An autoencoder neural network will have two parts, an encoder, and a decoder. We will only use nn.Linear layers of PyTorch deep learning library. We will not be using convolutional layers.

Let’s name our module SparseAutoencoder(). The following code block defines the SparseAutoencoder() module.

# define the autoencoder model
class SparseAutoencoder(nn.Module):
    def __init__(self):
        super(SparseAutoencoder, self).__init__()
 
        # encoder
        self.enc1 = nn.Linear(in_features=784, out_features=256)
        self.enc2 = nn.Linear(in_features=256, out_features=128)
        self.enc3 = nn.Linear(in_features=128, out_features=64)
        self.enc4 = nn.Linear(in_features=64, out_features=32)
        self.enc5 = nn.Linear(in_features=32, out_features=16)
 
        # decoder 
        self.dec1 = nn.Linear(in_features=16, out_features=32)
        self.dec2 = nn.Linear(in_features=32, out_features=64)
        self.dec3 = nn.Linear(in_features=64, out_features=128)
        self.dec4 = nn.Linear(in_features=128, out_features=256)
        self.dec5 = nn.Linear(in_features=256, out_features=784)
 
    def forward(self, x):
        # encoding
        x = F.relu(self.enc1(x))
        x = F.relu(self.enc2(x))
        x = F.relu(self.enc3(x))
        x = F.relu(self.enc4(x))
        x = F.relu(self.enc5(x))
 
        # decoding
        x = F.relu(self.dec1(x))
        x = F.relu(self.dec2(x))
        x = F.relu(self.dec3(x))
        x = F.relu(self.dec4(x))
        x = F.relu(self.dec5(x))
        return x
model = SparseAutoencoder().to(device)
  • In __init__(), starting from line 7, first, we have the encoder part of the autoencoder network. We have five encoder layers, starting from 784 in_features. This refers to the 784 pixels that the Fashion MNIST images have. Continuing to reduce this until self.enc5, we have 32 in_features and 16 out_features.
  • Then from lines 14 to 18, we define the decoder part of the network. Starting from self.dec1 till self.dec5, we keep on increasing the number of neurons. This continues till we reach 784 out_features in self.dec5.
  • We have the forward() function from line 20. This executes the real encoding and decoding functionalities of our autoencoder neural network. All the encoder and decoder neural network layers go through ReLU activation functions when the encoding and decoding happens.
  • Next, we initialize the SparseAutoencoder() module and load it onto the computation device.

We also need the loss function and optimizer for our autoencoder neural network model.

Loss Function and Optimizer

For the loss function, we will use MSELoss (Mean Squared Error Loss) as we need the error between the actual pixels and the reconstructed pixels. The optimizer is going to be Adam with a learning rate of 0.001.

# the loss function
criterion = nn.MSELoss()
# the optimizer
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

Computing the Sparse Loss

To compute the sparse loss, we will need the activation weights of the neural network model. This means that the sparsity gets computed after the model parameters have passed through the ReLU activation functions. Only calculating the sparsity on the model parameters will not prove to be useful.

First, we need to get a hold on all the model children.

# get the layers as a list
model_children = list(model.children())

You can print all the children and you will the following output.

Linear(in_features=784, out_features=256, bias=True)
Linear(in_features=256, out_features=128, bias=True)
Linear(in_features=128, out_features=64, bias=True)
Linear(in_features=64, out_features=32, bias=True)
Linear(in_features=32, out_features=16, bias=True)
Linear(in_features=16, out_features=32, bias=True)
Linear(in_features=32, out_features=64, bias=True)
Linear(in_features=64, out_features=128, bias=True)
Linear(in_features=128, out_features=256, bias=True)
Linear(in_features=256, out_features=784, bias=True)

We will define a sparse_loss() function that takes the autoencoder model and the images as input parameters. Then we will calculate the sparsity loss after the images pass through the model parameters and the ReLU activation function.

The following code block shows how to do this.

# define the sparse loss function
def sparse_loss(autoencoder, images):
    loss = 0
    values = images
    for i in range(len(model_children)):
        values = F.relu((model_children[i](values)))
        loss += torch.mean(torch.abs(values))
    return loss
  • Starting from line 5, we execute a for loop. Inside this for loop, we calculate the values as the model image pixels pass through the ReLU activation (line 6).
  • At line 7, we get the loss as the mean of all the values that are calculated in the for loop.
  • Finally, we return the loss.

One important thing to note here is that this is not the sparsity penalty. We will get the final sparsity penalty after we multiply reg_param with this loss and add it with the MSELoss.

The Training and Validation Functions

Here, we will define our training and validation functions. We will call them fit() and validate() respectively.

The Training Function

# define the training function
def fit(model, dataloader, epoch):
    print('Training')
    model.train()
    running_loss = 0.0
    counter = 0
    for i, data in tqdm(enumerate(dataloader), total=int(len(trainset)/dataloader.batch_size)):
        counter += 1
        img, _ = data
        img = img.to(device)
        img = img.view(img.size(0), -1)
        optimizer.zero_grad()
        outputs = model(img)
        mse_loss = criterion(outputs, img)
        if add_sparsity == 'yes':
            l1_loss = sparse_loss(model, img)
            # add the sparsity penalty
            loss = mse_loss + reg_param * l1_loss
        else:
            loss = mse_loss
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    
    epoch_loss = running_loss / counter
    print(f"Train Loss: {loss:.3f}")

    # save the reconstructed images every 5 epochs
    if epoch % 5 == 0:
        save_decoded_image(outputs.cpu().data, f"../outputs/images/train{epoch}.png")
    return epoch_loss
  • The fit() function takes three parameters, the autoencoder model, the data loader (trainloader), and the current epoch number.
  • running_loss at line 5, will help us calculate the batch-wise loss. Also, we will use the counter to calculate the per epoch loss.
  • From line 7, we iterate through the data. We only get the images (img) at line 9 as we do not need the labels for the autoencoder neural network.
  • At line 11, we are flattening the images as they will be passed into linear layers and not convolutional layers.
  • Line 12 updates the gradients to zero and we get the output at line 13.
  • At line 14, we get the mse_loss. Then at line 16, we call the sparse_loss function and calculate the final sparsity constraint at line 18. In our case, line 20 does not execute.
  • Line 21 backpropagates the gradients, line 22 updates the model parameters, and line 23 calculates the batch loss.
  • We calculate the epoch_loss at line 25 and save the trained reconstructed images every 5 epochs (lines 29 and 30).
  • Line 31, returns the epoch_loss.

The Validation Function

In the validation function:

  • We don’t need to backpropagate the gradients.
  • We don’t update the model parameters.

And everything will be within the with torch.no_grad() block. Everything else will be similar to the fit() function.

# define the validation function
def validate(model, dataloader, epoch):
    print('Validating')
    model.eval()
    running_loss = 0.0
    counter = 0
    with torch.no_grad():
        for i, data in tqdm(enumerate(dataloader), total=int(len(testset)/dataloader.batch_size)):
            counter += 1
            img, _ = data
            img = img.to(device)
            img = img.view(img.size(0), -1)
            outputs = model(img)
            loss = criterion(outputs, img)
            running_loss += loss.item()

    epoch_loss = running_loss / counter
    print(f"Val Loss: {loss:.3f}")  

    # save the reconstructed images every 5 epochs
    if epoch % 5 == 0:
        outputs = outputs.view(outputs.size(0), 1, 28, 28).cpu().data
        save_image(outputs, f"../outputs/images/reconstruction{epoch}.png")
    return epoch_loss

Executing the Training and Validation Functions

We will train and validate our autoencoder network by binding everything within a for loop.

# train and validate the autoencoder neural network
train_loss = []
val_loss = []
start = time.time()
for epoch in range(epochs):
    print(f"Epoch {epoch+1} of {epochs}")
    train_epoch_loss = fit(model, trainloader, epoch)
    val_epoch_loss = validate(model, testloader, epoch)
    train_loss.append(train_epoch_loss)
    val_loss.append(val_epoch_loss)
end = time.time()
 
print(f"{(end-start)/60:.3} minutes")

# save the trained model
torch.save(model.state_dict(), f"../outputs/sparse_ae{epochs}.pth")
  • train_loss and val_loss lists will store the per epoch training and validation loss values. We need the values to plot the loss graph in the end.
  • We then train and validate our model as per the number of epochs that will be specified in the command line arguments.
  • At line 16 we save the trained model as a .pth file.

In the end, we just need to save the loss plot.

# loss plots
plt.figure(figsize=(10, 7))
plt.plot(train_loss, color='orange', label='train loss')
plt.plot(val_loss, color='red', label='validataion loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.savefig('../outputs/loss.png')
plt.show()

Executing the Python File

To execute the sparse_ae_l1.py file, you need to be inside the src folder. From there, type the following command in the terminal.

python sparse_ae_l1.py --epochs=25 --add_sparse=yes

We are training the autoencoder model for 25 epochs and adding the sparsity regularization as well. Here is a short snippet of the output that we get.

Add sparsity regularization: yes
Linear(in_features=784, out_features=256, bias=True)
Linear(in_features=256, out_features=128, bias=True)
Linear(in_features=128, out_features=64, bias=True)
Linear(in_features=64, out_features=32, bias=True)
Linear(in_features=32, out_features=16, bias=True)
Linear(in_features=16, out_features=32, bias=True)
Linear(in_features=32, out_features=64, bias=True)
Linear(in_features=64, out_features=128, bias=True)
Linear(in_features=128, out_features=256, bias=True)
Linear(in_features=256, out_features=784, bias=True)
Epoch 1 of 25
Training
100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 1875/1875 [00:44<00:00, 42.09it/s]
Train Loss: 0.051
Validating
313it [00:01, 200.41it/s]
Val Loss: 0.057
...
Training
100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 1875/1875 [00:45<00:00, 41.25it/s]
Train Loss: 0.027
Validating
313it [00:01, 197.38it/s]
Val Loss: 0.032
Epoch 25 of 25
Training
100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 1875/1875 [00:45<00:00, 41.01it/s]
Train Loss: 0.025
Validating
313it [00:01, 176.61it/s]
Val Loss: 0.032

Analyzing the Results

From the loss values, we can infer that our model is learning well. Still, we should take a look at the loss plots that we have saved. Later, we will also analyze the images that the autoencoder neural network has reconstructed.

Graphical plot for the loss values of the training of autoencoder neural network.
Graphical plot for the loss values of the training of autoencoder neural network.

We can see that there is no overfitting while training the model which is a good thing. Now, taking a look at the reconstructed images by the autoencoder.

FashionMNIST reconstructed images after epoch 0 using Sprase autoencoders.
Sparse autoencoder Fashion MNIST reconstructed image after epoch 0
Autoencoder Fashion MNIST reconstructed image after epoch 20 using Sparse autoencoders.
Sparse autoencoder Fashion MNIST reconstructed image after epoch 20

The first image shows the autoencoder reconstructed images after the first epoch. The images are blurry due to the additional sparsity penalty. After 20 epochs (the second image), the autoencoder is able to create somewhat clearer images. These images are still blurry but it is able to capture the important details like the handles of the bags.

These reconstructions are different from fully connected autoencoders without any sparsity. As in those cases, the autoencoder neural network can just copy the image if we train it long enough. But adding the sparsity penalty allows the autoencoder to capture the important details rather than just copying the input image to the output.

For gaining more knowledge about sparse autoencoders and regularization techniques you can read the following articles:

Summary and Conclusion

In this article, you learned how to add the L1 sparsity penalty to the autoencoder neural network so that it does not just copy the input image to the output. Rather it should be able to capture the important features of the images. In the next article, we will learn how to use KL-divergence for adding the sparsity penalty. If you have any thoughts, doubts, or suggestions, please leave them in the comment section. I will try my best to address them.

You can contact me using the Contact section. You can also find me on LinkedIn, and Twitter.

Liked it? Take a second to support Sovit Ranjan Rath on Patreon!
Become a patron at Patreon!

11 thoughts on “Sparse Autoencoders using L1 Regularization with PyTorch”

  1. Guangye Tian says:

    Hello Sovit,

    Thank you very much for your sharing! A little question, in the math part, the L1 regularization term is defined as the weighted sum of the absolute values of all the weights of neural network. However in the python code, it is calculated using the activations of each layer. I am wondering if I understand correctly? Are the two equivalent?

    Thank you! Guangye

    1. Sovit Ranjan Rath says:

      First of all, I am happy that you found the tutorial helpful. Now, coming to your question. Your question is quite good and relevant as well. For writing this article, I followed this paper by Andrew Ng https://web.stanford.edu/class/cs294a/sparseAutoencoder.pdf
      Please do take look at PAGE 14. I am explaining my intuition here. After we apply activation to the units, then we only have those units’ weights which are active. No inactive unit’s weights are present. From the paper also I followed that it would make more sense to apply sparsity to those weights that are active only. We would not want to apply sparsity to all the weights and then discard them. This would mean some unnecessary operations would be discarded later. This is my take on it.
      But if you have another thought on this, please do share. I would love to hear it.

      1. Guangye says:

        Hello Sovit,

        Thank you for your explanation! It’s very helpful.

        Guangye

        1. Sovit Ranjan Rath says:

          You are welcome and glad to help.

  2. AK says:

    Hello
    If we run this code on google colab, how can we access the images formed, also how will we be able to access the input and output folders?
    Also, how to apply sparsity with convolutional autoencoders?

    1. Sovit Ranjan Rath says:

      Hello AK. To run and access the images saved in Colab, we have to do a bit of folder setup in the Colab environment. Currently, I am trying to provide colab access for all my previous and future codes as well. That will take some time.
      Now, coming to applying sparsity to convolutional autoencoders, I have not written an article on that till now. I will try my best to cover that in the future.

  3. Anh says:

    Hey I think your implementation of the flow for sparse loss only makes sense for if the loss is KL divergence, not L1 reg. L1 reg is the sum of abs of the values of weights (not the activation of linear layers’ outputs)

    1. Sovit Ranjan Rath says:

      I will look it up. Not working with autoencoders that actively right now, so cannot say something concretely. Thanks for informing. If need be, I will update it.

  4. SL says:

    I think sparsity should be applied to activations not weight. Isn’t it?

    1. Sovit Ranjan Rath says:

      Hello. I will need to take a detailed look at the code. I wrote this article a long time ago and need to check again.

      Still, thanks for bringing it to my attention.

Leave a Reply

Your email address will not be published. Required fields are marked *