Basics of TensorFlow GradientTape

In this tutorial, you will learn the basics of TensorFlow GradientTape. Starting from TensorFlow 2.0 and further versions (2.1, 2.2, …), GradientTape is a new function. But what is TensorFlow GradientTape actually? And how does it help? Let’s learn those in this tutorial.

Download the Source Code for this Tutorial

This blog post is the second in the series Getting Started with TensorFlow.

Introduction to Tensors in TensorFlow.
Basics of TensorFlow GradientTape.

**Figure 1. TensorFlow GradientTape API. We will go through the basics of using the GradientTape in TensorFlow in this tutorial.**

We are going to cover the topics things in this tutorial.

What is TensorFlow Gradient Tape?
TensorFlow GradientTape on a Variable.
GradientTape() on a tf.contant() Tensor.
Controlling Trainable Variables.
Combining everything we learned into a single code block.

Note: This is a very introductory tutorial to TensorFlow GradientTape and will mainly help those who are completely new to either deep learning or TensorFlow. If you are familiar with the topic, please leave your thoughts in the comment section on what can be improved.

Before Moving Forward…

If you are completely new to TensorFlow, then I highly recommend going through the first post of the series. There, we talk installation of TensorFlow 2.5 on our local system and cover the basics of tensors in TensorFlow. It will surely help to understand the concepts of this tutorial as well.

Also, all the tutorials in the Getting Started with TensorFlow series will follow the TensorFlow 2.5 version.

Directory Structure

The following is the directory structure we are going to follow.

├── gradient_tape.ipynb

We have just one Jupyter Notebook for this tutorial, that is, gradient_tape.ipynb. If you decide to type the code while following the tutorial, I highly recommend using Jupyter Notebooks as you can execute the cells individually and get the outputs instantly. This process is great for learning new topics in deep learning and machine learning.

What is TensorFlow GradientTape?

Starting from version 2.0, TensorFlow provides the tf.GradientTape() API. This helps in carrying out Automatic Differentiation which in turn helps in backpropagation while training neural networks.

Using the tf.GradientTape API, we can compute the gradients with respect to some input variables. But for this, we need to track and record all the operations that are happening. Here, tf.GradientTape helps to do that as well. The tracking and recording of operations are mostly done in the forward pass. Then during the backward pass, tf.GradientTape follows the operation in reverse order to compute the gradients.

There are a few things to keep in mind while using the tf.GradientTape to record operations.

Operations are only recorded if they are within the tf.GradientTape context.

with tf.GradientTape() as tape:
   # ... carry out some operation

tf.GradientTape can only track and record operations for those tensor variables, which are trainable, like tf.Variables.
For tracking operations of constant tensors (tf.constant), we need to tell the GradientTape to watch() the variable. This is because, constant tensors are not trainable by default.

The above points will become clear once we start the coding part of this tutorial.

If you need to learn more about automatic differentiation and backpropagation, then please visit the following Wikipedia pages. They will surely help.

Let’s start with the coding part of the tutorial.

GradientTape in TensorFlow

We will slowly build upon the things in this tutorial while learning about GradietTape. In the final section, we will combine all the code into one place to get a proper clear picture of everything.

TensorFlow GradientTape on a Variable

Let’s start with using the TensorFlow GradientTape on a TensorFlow Variable. We can create a variable in TensorFlow using the tf.Variable class.

Variables in TensorFlow represent those tensors whose values can be changed and tracked during run time through different operations.

Download the Source Code for this Tutorial

Take a look at the following code block.

import tensorflow as tf

x = tf.Variable(20.0)
print(x)

with tf.GradientTape() as tape:
    y = x**2
    
    # dy = 2x * dx = 2*20.0 = 40.0 
    dy_dx = tape.gradient(y, x)
print(dy_dx)

First thing, we need to import TensorFlow. This we are doing with an alias tf. Now, let’s go through the code thoroughly.

We define a tensor with value 20.0 using tf.Variable and store it in x. It is a rank 0 tensor (learn more about tensors of different ranks here).
From line 6, we begin the with tf.GradientTape() block.
First, we assign y as x\(^2\).
On line 10, we use the tape.gradient() to calculate the gradient of y with respect to x.
tape.gradient() calculates the gradient of a target with respect to a source. That is, tape.gradient(target, sources), where both target and sources are tensors.

After all the operations are complete within the GradientTape context, we print the result.

<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=20.0>
tf.Tensor(40.000004, shape=(), dtype=float32)

We can see that the result is a TensorFlow Tensor object with the value 40.000004. This value is called the gradient of the target. Also note that the data type of x was float32 by default. This is because we had passed 20.0 as the value, which is a floating-point number. This bit is important and we will get to the reason a bit later.

GradientTape on an Integer Variable

Above, we tried GradientTape() on a floating-point number. Now, let’s try with an integer value as well and see what happens.

x = tf.Variable(30)
print(x)

with tf.GradientTape() as tape:
    y = x**3
    print(y)
    dy_dx = tape.gradient(y, x)

print(dy_dx)

First, we define x as a TensorFlow Variable and initialize it to 30.
Then we start the tf.GradientTape() context and try to calculate the gradient of y after assigning y = x\(^3\).

The following is the output that we get.

<tf.Variable 'Variable:0' shape=() dtype=int32, numpy=30>
tf.Tensor(27000, shape=(), dtype=int32)
WARNING:tensorflow:The dtype of the target tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.int32
WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.int32
None

We get the confirmation that indeed an int32 data was passed into the tf.GradientTape() context.
W are also printing the resulting y value which is of type int32 as well.
But after that, we do not get the expected results, or at least the results that we got in the case of a floating-point number. We get a warning that the target and source tensors must be a floating-point numbers when calling GradientTape.gradient. And we know that x and y are integers.
Finally, the gradients are not calculted and we get the result as None.

Now, there will be cases when we may pass integer values by mistake while creating a tf.Variable instance. To mitigate such issues, whether we pass an integer value or floating-point value, we can always define the data type ourselves explicitly. This will work in all cases and we do not need to worry whether we actually pass 30 or 30.0 to tf.Variable.

Let’s take a look at a simple example.

x = tf.Variable(30, dtype=tf.float32)
print(x)

with tf.GradientTape() as tape:
    y = x**3
    dy_dx = tape.gradient(y, x)

print(dy_dx)

On line 1, we creata a TensorFlow Variable of value 30. With this, we also assign the dtype as tf.float32 explicitly.
In the lines following, we carry out the usual gradient calculation operation within the tf.GradientTape() context.

The following is the output.

<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=30.0>
tf.Tensor(2699.9998, shape=(), dtype=float32)

We can see that x is actually of float32 data type and its value has been converted from 30 to 30.0 automatically. The rest of the code also works as expected and we get the gradient result as well.

This shows that by taking care of the data type on our own most of the time, we can avoid many of the unseen and unexpected errors while calculating gradients.

Using GradientTape on a 1D Tensor Variable

Until now, all the examples of gradient calculation were on rank-0 tensors or scalars. But we can calculate the gradient using GradientTape on tensors of any dimension.

For example, the following code block shows how to use GradientTape on a 1D tensor.

x = tf.Variable([1.0, 2.0, 3.0], dtype=tf.float32)
with tf.GradientTape() as tape:
    y = x**3
    print(y)
    dy_dx = tape.gradient(y, x)

print(dy_dx)

There is nothing special about the above code. Instead of a rank-0 tensor, we are just creating a tf.Variable instance of a rank-1 tensor. Then we are calculating the gradient within the tf.GradientTape context. The following block shows the result.

tf.Tensor([ 1.  8. 27.], shape=(3,), dtype=float32)
tf.Tensor([ 3. 12. 27.], shape=(3,), dtype=float32)

y is also a 1-dimensional tensor where each of the values is the cube of the values of the tensor x. The result is a 1-dimensional tensor as well where the values are the gradients of individual values of y. That’s all there is to it while dealing with gradients of tensors with ranks greater than 0. You can try a few more examples with rank-2 and rank-3 tensors and see what kind of results you are getting.

GradientTape on a tf.constant() Tensor

Until now, we have seen how GradientTape works with TensorFlow Variables (tf.Variable()) whose operations and values are tracked by default.

But we can also create constant tensors in TensorFlow using tf.constant(). The operations of constant tensors are not tracked by default. So, how does GradientTape behave in such cases while calculating the gradients? Let’s check out with a simple example.

x = tf.constant(3.0)
print(x)

with tf.GradientTape() as tape:
    y = x**2
    
    # dy = 2x * dx
    dy_dx = tape.gradient(y, x)
print(dy_dx)

On line 1, we create a constant tensor of value 3.0. It will have dtype=tf.float32.
Starting from line 4, we calculate the gradient of y with respect to x within the tf.GradientTape() context.

So, what is the final result that we get?

tf.Tensor(3.0, shape=(), dtype=float32)
None

The result is None. Unlike Variable tensors, the GradientTape() API does not watch the operations of constant tensors by default. That’s why the operations are not tracked and gradients are not calculated as well.

How do we solve this issue? We can tell the GradientTape() API explicitly to watch the tensor x. After that, all its operations will be tracked. Take a look at the following code block.

x = tf.constant(3.0)

with tf.GradientTape() as tape:
    # tell the GradientTape() API to watch the tensor `x` explicitly
    tape.watch(x)
    y = x**2
    
    # dy = 2x * dx
    dy_dx = tape.gradient(y, x)
print(dy_dx)

Take a look at line 5 closely. We are using tape.watch() to watch the operations of tensor x and track those as well. And the following is the result.

tf.Tensor(6.0, shape=(), dtype=float32)

The gradients are now calculated as expected. You can use tape.watch() on tf.Variable() tensors as well, but that is not required as all the operations are tracked by default.

Controlling Trainable Variables

Here, we will learn how to control which trainable and non-trainable variables GradientTape() watches.

tf.Variable have a trainable parameter that is True by default. This helps to track all the operations of the variable. But sometimes, we may not need the variable to be trainable. This can vary from one use case to another. For, this we can pass trainable=False while creating the variable. In such cases, the tf.GradientTape() API does not watch() the variable operations anymore.

x = tf.Variable(5.0, trainable=False)

# GradientTape() does not watch non-trainable paramters/variables
with tf.GradientTape() as tape:
    y = x**3

    dy_dx = tape.gradient(y, x)
print(dy_dx)

None

You see, we get the result as None as GradientTape() does not watch the operations on x anymore.

We can still bypass this by telling the tf.GradientTape() API to watch the variable. Just as we did in the case of constant tensors.

x = tf.Variable(5.0, trainable=False)

# We need to watch non-trainable variables 
with tf.GradientTape() as tape:
    tape.watch(x)
    y = x**3

    dy_dx = tape.gradient(y, x)
print(dy_dx)

tf.Tensor(74.99999, shape=(), dtype=float32)

By using tape.watch(x), the operations are now tracked and we are able to calculate the gradients of y with respect to x.

Watching Accessed Variables

All the tf.Variable() instances that have trainable=True are accessed by the tf.GradientTape() for tracking operations and calculating the gradients. This is the default case.

x0 = tf.Variable(5.0)

# watch_accessed_variables is True by default
with tf.GradientTape(watch_accessed_variables=True) as tape:
    y0 = x0**2

    dy_dx0 = tape.gradient(y0, x0)

print(dy_dx0)

tf.Tensor(10.0, shape=(), dtype=float32)

In the above code example, we are using watch_accessed_variables=True which is the default case and we need not specify it.

But what if we want to have trainable variables but do not want the tf.GradientTape() API to track the operations for gradient calculation? We can simply pass watch_accessed_variables=False to the GradientTape context.

x0 = tf.Variable(5.0)

# we can tell GradientTape to not to watch any accessed variables 
with tf.GradientTape(watch_accessed_variables=False) as tape:
    y0 = x0**2

    dy_dx0 = tape.gradient(y0, x0)
print(dy_dx0)

None

And we get the result as None.

Now, the above scenario can be helpful when we have more than one trainable variable but only want to compute the gradients of the selected ones.

In such cases, we can use tape.watch() to watch those tensors which gradients we want to calculate. For, now let’s try that out with one variable.

x0 = tf.Variable(5.0)

# choose which variables to watch
with tf.GradientTape(watch_accessed_variables=False) as tape:
    tape.watch(x0)
    
    y0 = x0**2
    
    dy_dx0 = tape.gradient(y0, x0)
print(dy_dx0)

tf.Tensor(10.0, shape=(), dtype=float32)

After passing x0 to tape.watch(), we can calculate the gradients even though watch_accessed_variables is False.

Computing Gradient of More Than One Variable with persistent=True

By default, GradientTape releases the resources held by it as soon as we call the GradientTape.gradient() method. So, we cannot actually compute the gradient of two variables within the same context by default. Instead, we can pass persistent=True while creating the GradientTape() context which allows us to compute gradients more than once.

x0 = tf.Variable(5.0)
x1 = tf.Variable(10.0)

# `persistent=False` by default
with tf.GradientTape() as tape:    
    y0 = x0**2
    y1 = x1**2
    
    dy_dx0 = tape.gradient(y0, x0)
    dy_dx1 = tape.gradient(y1, x1)
print(dy_dx0)
print(dy_dx1)

You will get the following error with the above code.

RuntimeError: A non-persistent GradientTape can only be used to compute one set of gradients (or jacobians)

The following code block shows the same example with persistent=True.

x0 = tf.Variable(5.0)
x1 = tf.Variable(10.0)

# using `persistent=True`
with tf.GradientTape(persistent=True) as tape:    
    y0 = x0**2
    y1 = x1**2
    
    dy_dx0 = tape.gradient(y0, x0)
    dy_dx1 = tape.gradient(y1, x1)
print(dy_dx0)
print(dy_dx1)

tf.Tensor(10.0, shape=(), dtype=float32)
tf.Tensor(20.0, shape=(), dtype=float32)

And we get the expected results.

Combining Everything We Learned into a Single Code Block

Let’s combine the things we have learned so far into a single code block so that every concept will be in one place.

# TensorFlow GradientTape on a floating variable
x = tf.Variable(20.0, dtype=tf.float32)
print(x)

with tf.GradientTape() as tape:
    y = x**2
    
    # dy = 2x * dx = 2*20.0 = 40.0 
    dy_dx = tape.gradient(y, x)
print(f"TensorFlow GradientTape on a floating variable: {dy_dx}\n")

# TensforFlow GradientTape on an integer value
x = tf.Variable(30)
print(x)

with tf.GradientTape() as tape:
    y = x**3
    # resulting `y` will be `tf.int32` as well
    print(y)
    dy_dx = tape.gradient(y, x)

print(f"TensorFlow GradientTape on an integer variable: {dy_dx}\n")

# GradientTape on a 1D tensor variable
x = tf.Variable([1.0, 2.0, 3.0], dtype=tf.float32)
with tf.GradientTape() as tape:
    y = x**3
    print(y)
    dy_dx = tape.gradient(y, x)

print(f"GradientTape on a 1D tensor variable: {dy_dx}\n")

# GradientTape on a constant tensor
x = tf.constant(3.0)

with tf.GradientTape() as tape:
    # tell the GradientTape() API to watch the tensor `x` explicitly
    tape.watch(x)
    y = x**2
    
    # dy = 2x * dx
    dy_dx = tape.gradient(y, x)
print(f"GradientTape on a constant tensor: {dy_dx}\n")


"""
Controlling trainable variables
"""
x = tf.Variable(5.0, trainable=False)

# GradientTape() does not watch non-trainable paramters/variables
with tf.GradientTape() as tape:
    y = x**3

    dy_dx = tape.gradient(y, x)
print(f"GradientTape does not watch non-trainable variables: {dy_dx}")

x = tf.Variable(5.0, trainable=False)

# We need to watch non-trainable variables 
with tf.GradientTape() as tape:
    tape.watch(x)
    y = x**3

    dy_dx = tape.gradient(y, x)
print(f"We need to tell GradientTape to watch non-trainable variables explicitly: {dy_dx}\n\n")

"""
GradientTape on more than one variable with `persistent=True`
"""
x0 = tf.Variable(5.0)
x1 = tf.Variable(10.0)

# using `persistent=True`
with tf.GradientTape(persistent=True) as tape:    
    y0 = x0**2
    y1 = x1**2
    
    dy_dx0 = tape.gradient(y0, x0)
    dy_dx1 = tape.gradient(y1, x1)
print(dy_dx0)
print(dy_dx1)

This brings us to the end of the coding part of this tutorial.

Summary and Conclusion

In this tutorial, we covered the very basics of TensorFlow GradientTape API. We did not cover all, there are a lot more things to be explored in GradientTape. But this should act as a good starting point for anyone who is just starting to learn TensorFlow. I hope that you learned something new in this tutorial.

If you have any doubts, thoughts, or suggestions, please leave them in the comment section. I will surely address them.

You can contact me using the Contact section. You can also find me on LinkedIn, and Twitter.

Liked it? Take a second to support Sovit Ranjan Rath on Patreon!

8 thoughts on “Basics of TensorFlow GradientTape”

Pingback: Linear Regression using TensorFlow GradientTape - DebuggerCafe
Pingback: Transfer Learning using TensorFlow - DebuggerCafe
Travis says:

May 26, 2022 at 9:29 pm

Hello!

Excellent tutorial I must say. Can you possibly help me figure out how to do nested gradients with respect to multiple variables. I can’t figure out how to take a gradient of a gradient with respect to different variables.

Thanks for your time!

1. Sovit Ranjan Rath says:
  
  May 27, 2022 at 9:28 pm
  
  Hello Travis. Glad that you liked the tutorial.
  So, are you looking for something like tf.GradientTape() inside another tf.GradientTape() context?
  
Pingback: Training Your First Neural Network in TensorFlow
Pingback: Convolutional Neural Network in TensorFlow
Pingback: Image Classification using TensorFlow on Custom Dataset
Pingback: Image Classification using TensorFlow Pretrained Models

Basics of TensorFlow GradientTape

Before Moving Forward…

Directory Structure

What is TensorFlow GradientTape?

GradientTape in TensorFlow

TensorFlow GradientTape on a Variable

GradientTape on an Integer Variable

Using GradientTape on a 1D Tensor Variable

GradientTape on a tf.constant() Tensor

Controlling Trainable Variables

Watching Accessed Variables

Computing Gradient of More Than One Variable with persistent=True

Combining Everything We Learned into a Single Code Block

Summary and Conclusion

8 thoughts on “Basics of TensorFlow GradientTape”

Leave a Reply Cancel reply