In this tutorial, you will learn the basics of TensorFlow GradientTape. Starting from TensorFlow 2.0 and further versions (2.1, 2.2, …), GradientTape is a new function. But what is TensorFlow GradientTape actually? And how does it help? Let’s learn those in this tutorial.
This blog post is the second in the series Getting Started with TensorFlow.
- Introduction to Tensors in TensorFlow.
- Basics of TensorFlow GradientTape.
We are going to cover the topics things in this tutorial.
- What is TensorFlow Gradient Tape?
- TensorFlow GradientTape on a Variable.
- GradientTape() on a tf.contant() Tensor.
- Controlling Trainable Variables.
- Combining everything we learned into a single code block.
Note: This is a very introductory tutorial to TensorFlow GradientTape and will mainly help those who are completely new to either deep learning or TensorFlow. If you are familiar with the topic, please leave your thoughts in the comment section on what can be improved.
Before Moving Forward…
If you are completely new to TensorFlow, then I highly recommend going through the first post of the series. There, we talk installation of TensorFlow 2.5 on our local system and cover the basics of tensors in TensorFlow. It will surely help to understand the concepts of this tutorial as well.
Also, all the tutorials in the Getting Started with TensorFlow series will follow the TensorFlow 2.5 version.
Directory Structure
The following is the directory structure we are going to follow.
├── gradient_tape.ipynb
We have just one Jupyter Notebook for this tutorial, that is, gradient_tape.ipynb
. If you decide to type the code while following the tutorial, I highly recommend using Jupyter Notebooks as you can execute the cells individually and get the outputs instantly. This process is great for learning new topics in deep learning and machine learning.
What is TensorFlow GradientTape?
Starting from version 2.0, TensorFlow provides the tf.GradientTape()
API. This helps in carrying out Automatic Differentiation which in turn helps in backpropagation while training neural networks.
Using the tf.GradientTape
API, we can compute the gradients with respect to some input variables. But for this, we need to track and record all the operations that are happening. Here, tf.GradientTape
helps to do that as well. The tracking and recording of operations are mostly done in the forward pass. Then during the backward pass, tf.GradientTape
follows the operation in reverse order to compute the gradients.
There are a few things to keep in mind while using the tf.GradientTape
to record operations.
- Operations are only recorded if they are within the
tf.GradientTape
context.
with tf.GradientTape() as tape: # ... carry out some operation
tf.GradientTape
can only track and record operations for those tensor variables, which are trainable, liketf.Variable
s.- For tracking operations of constant tensors (
tf.constant
), we need to tell the GradientTape towatch()
the variable. This is because, constant tensors are not trainable by default.
The above points will become clear once we start the coding part of this tutorial.
If you need to learn more about automatic differentiation and backpropagation, then please visit the following Wikipedia pages. They will surely help.
Let’s start with the coding part of the tutorial.
GradientTape in TensorFlow
We will slowly build upon the things in this tutorial while learning about GradietTape. In the final section, we will combine all the code into one place to get a proper clear picture of everything.
TensorFlow GradientTape on a Variable
Let’s start with using the TensorFlow GradientTape on a TensorFlow Variable. We can create a variable in TensorFlow using the tf.Variable
class.
Variables in TensorFlow represent those tensors whose values can be changed and tracked during run time through different operations.
Take a look at the following code block.
import tensorflow as tf x = tf.Variable(20.0) print(x) with tf.GradientTape() as tape: y = x**2 # dy = 2x * dx = 2*20.0 = 40.0 dy_dx = tape.gradient(y, x) print(dy_dx)
First thing, we need to import TensorFlow. This we are doing with an alias tf
. Now, let’s go through the code thoroughly.
- We define a tensor with value 20.0 using
tf.Variable
and store it inx
. It is a rank 0 tensor (learn more about tensors of different ranks here). - From line 6, we begin the
with tf.GradientTape()
block. - First, we assign
y
asx
\(^2\). - On line 10, we use the
tape.gradient()
to calculate the gradient ofy
with respect tox
. tape.gradient()
calculates the gradient of a target with respect to a source. That is,tape.gradient(target, sources)
, where bothtarget
andsources
are tensors.
After all the operations are complete within the GradientTape
context, we print the result.
<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=20.0> tf.Tensor(40.000004, shape=(), dtype=float32)
We can see that the result is a TensorFlow Tensor
object with the value 40.000004. This value is called the gradient of the target
. Also note that the data type of x
was float32
by default. This is because we had passed 20.0 as the value, which is a floating-point number. This bit is important and we will get to the reason a bit later.
GradientTape on an Integer Variable
Above, we tried GradientTape()
on a floating-point number. Now, let’s try with an integer value as well and see what happens.
x = tf.Variable(30) print(x) with tf.GradientTape() as tape: y = x**3 print(y) dy_dx = tape.gradient(y, x) print(dy_dx)
- First, we define
x
as a TensorFlowVariable
and initialize it to 30. - Then we start the
tf.GradientTape()
context and try to calculate the gradient ofy
after assigningy
=x
\(^3\).
The following is the output that we get.
<tf.Variable 'Variable:0' shape=() dtype=int32, numpy=30> tf.Tensor(27000, shape=(), dtype=int32) WARNING:tensorflow:The dtype of the target tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.int32 WARNING:tensorflow:The dtype of the source tensor must be floating (e.g. tf.float32) when calling GradientTape.gradient, got tf.int32 None
- We get the confirmation that indeed an
int32
data was passed into thetf.GradientTape()
context. - W are also printing the resulting
y
value which is of typeint32
as well. - But after that, we do not get the expected results, or at least the results that we got in the case of a floating-point number. We get a warning that the
target
andsource
tensors must be a floating-point numbers when callingGradientTape.gradient
. And we know thatx
andy
are integers. - Finally, the gradients are not calculted and we get the result as
None
.
Now, there will be cases when we may pass integer values by mistake while creating a tf.Variable
instance. To mitigate such issues, whether we pass an integer value or floating-point value, we can always define the data type ourselves explicitly. This will work in all cases and we do not need to worry whether we actually pass 30 or 30.0 to tf.Variable
.
Let’s take a look at a simple example.
x = tf.Variable(30, dtype=tf.float32) print(x) with tf.GradientTape() as tape: y = x**3 dy_dx = tape.gradient(y, x) print(dy_dx)
- On line 1, we creata a TensorFlow
Variable
of value 30. With this, we also assign thedtype
astf.float32
explicitly. - In the lines following, we carry out the usual gradient calculation operation within the
tf.GradientTape()
context.
The following is the output.
<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=30.0> tf.Tensor(2699.9998, shape=(), dtype=float32)
We can see that x
is actually of float32
data type and its value has been converted from 30 to 30.0 automatically. The rest of the code also works as expected and we get the gradient result as well.
This shows that by taking care of the data type on our own most of the time, we can avoid many of the unseen and unexpected errors while calculating gradients.
Using GradientTape on a 1D Tensor Variable
Until now, all the examples of gradient calculation were on rank-0 tensors or scalars. But we can calculate the gradient using GradientTape
on tensors of any dimension.
For example, the following code block shows how to use GradientTape on a 1D tensor.
x = tf.Variable([1.0, 2.0, 3.0], dtype=tf.float32) with tf.GradientTape() as tape: y = x**3 print(y) dy_dx = tape.gradient(y, x) print(dy_dx)
There is nothing special about the above code. Instead of a rank-0 tensor, we are just creating a tf.Variable
instance of a rank-1 tensor. Then we are calculating the gradient within the tf.GradientTape
context. The following block shows the result.
tf.Tensor([ 1. 8. 27.], shape=(3,), dtype=float32) tf.Tensor([ 3. 12. 27.], shape=(3,), dtype=float32)
y
is also a 1-dimensional tensor where each of the values is the cube of the values of the tensor x
. The result is a 1-dimensional tensor as well where the values are the gradients of individual values of y
. That’s all there is to it while dealing with gradients of tensors with ranks greater than 0. You can try a few more examples with rank-2 and rank-3 tensors and see what kind of results you are getting.
GradientTape on a tf.constant() Tensor
Until now, we have seen how GradientTape works with TensorFlow Variable
s (tf.Variable()
) whose operations and values are tracked by default.
But we can also create constant tensors in TensorFlow using tf.constant()
. The operations of constant tensors are not tracked by default. So, how does GradientTape
behave in such cases while calculating the gradients? Let’s check out with a simple example.
x = tf.constant(3.0) print(x) with tf.GradientTape() as tape: y = x**2 # dy = 2x * dx dy_dx = tape.gradient(y, x) print(dy_dx)
- On line 1, we create a constant tensor of value 3.0. It will have
dtype=tf.float32
. - Starting from line 4, we calculate the gradient of
y
with respect tox
within thetf.GradientTape()
context.
So, what is the final result that we get?
tf.Tensor(3.0, shape=(), dtype=float32) None
The result is None
. Unlike Variable
tensors, the GradientTape()
API does not watch the operations of constant tensors by default. That’s why the operations are not tracked and gradients are not calculated as well.
How do we solve this issue? We can tell the GradientTape()
API explicitly to watch the tensor x
. After that, all its operations will be tracked. Take a look at the following code block.
x = tf.constant(3.0) with tf.GradientTape() as tape: # tell the GradientTape() API to watch the tensor `x` explicitly tape.watch(x) y = x**2 # dy = 2x * dx dy_dx = tape.gradient(y, x) print(dy_dx)
Take a look at line 5 closely. We are using tape.watch()
to watch the operations of tensor x
and track those as well. And the following is the result.
tf.Tensor(6.0, shape=(), dtype=float32)
The gradients are now calculated as expected. You can use tape.watch()
on tf.Variable()
tensors as well, but that is not required as all the operations are tracked by default.
Controlling Trainable Variables
Here, we will learn how to control which trainable and non-trainable variables GradientTape()
watches.
tf.Variable
have a trainable parameter that is True
by default. This helps to track all the operations of the variable. But sometimes, we may not need the variable to be trainable. This can vary from one use case to another. For, this we can pass trainable=False
while creating the variable. In such cases, the tf.GradientTape()
API does not watch()
the variable operations anymore.
x = tf.Variable(5.0, trainable=False) # GradientTape() does not watch non-trainable paramters/variables with tf.GradientTape() as tape: y = x**3 dy_dx = tape.gradient(y, x) print(dy_dx)
None
You see, we get the result as None
as GradientTape()
does not watch the operations on x
anymore.
We can still bypass this by telling the tf.GradientTape()
API to watch the variable. Just as we did in the case of constant tensors.
x = tf.Variable(5.0, trainable=False) # We need to watch non-trainable variables with tf.GradientTape() as tape: tape.watch(x) y = x**3 dy_dx = tape.gradient(y, x) print(dy_dx)
tf.Tensor(74.99999, shape=(), dtype=float32)
By using tape.watch(x)
, the operations are now tracked and we are able to calculate the gradients of y
with respect to x
.
Watching Accessed Variables
All the tf.Variable()
instances that have trainable=True
are accessed by the tf.GradientTape()
for tracking operations and calculating the gradients. This is the default case.
x0 = tf.Variable(5.0) # watch_accessed_variables is True by default with tf.GradientTape(watch_accessed_variables=True) as tape: y0 = x0**2 dy_dx0 = tape.gradient(y0, x0) print(dy_dx0)
tf.Tensor(10.0, shape=(), dtype=float32)
In the above code example, we are using watch_accessed_variables=True
which is the default case and we need not specify it.
But what if we want to have trainable variables but do not want the tf.GradientTape()
API to track the operations for gradient calculation? We can simply pass watch_accessed_variables=False
to the GradientTape context.
x0 = tf.Variable(5.0) # we can tell GradientTape to not to watch any accessed variables with tf.GradientTape(watch_accessed_variables=False) as tape: y0 = x0**2 dy_dx0 = tape.gradient(y0, x0) print(dy_dx0)
None
And we get the result as None
.
Now, the above scenario can be helpful when we have more than one trainable variable but only want to compute the gradients of the selected ones.
In such cases, we can use tape.watch()
to watch those tensors which gradients we want to calculate. For, now let’s try that out with one variable.
x0 = tf.Variable(5.0) # choose which variables to watch with tf.GradientTape(watch_accessed_variables=False) as tape: tape.watch(x0) y0 = x0**2 dy_dx0 = tape.gradient(y0, x0) print(dy_dx0)
tf.Tensor(10.0, shape=(), dtype=float32)
After passing x0
to tape.watch()
, we can calculate the gradients even though watch_accessed_variables
is False
.
Computing Gradient of More Than One Variable with persistent=True
By default, GradientTape
releases the resources held by it as soon as we call the GradientTape.gradient()
method. So, we cannot actually compute the gradient of two variables within the same context by default. Instead, we can pass persistent=True
while creating the GradientTape()
context which allows us to compute gradients more than once.
x0 = tf.Variable(5.0) x1 = tf.Variable(10.0) # `persistent=False` by default with tf.GradientTape() as tape: y0 = x0**2 y1 = x1**2 dy_dx0 = tape.gradient(y0, x0) dy_dx1 = tape.gradient(y1, x1) print(dy_dx0) print(dy_dx1)
You will get the following error with the above code.
RuntimeError: A non-persistent GradientTape can only be used to compute one set of gradients (or jacobians)
The following code block shows the same example with persistent=True
.
x0 = tf.Variable(5.0) x1 = tf.Variable(10.0) # using `persistent=True` with tf.GradientTape(persistent=True) as tape: y0 = x0**2 y1 = x1**2 dy_dx0 = tape.gradient(y0, x0) dy_dx1 = tape.gradient(y1, x1) print(dy_dx0) print(dy_dx1)
tf.Tensor(10.0, shape=(), dtype=float32) tf.Tensor(20.0, shape=(), dtype=float32)
And we get the expected results.
Combining Everything We Learned into a Single Code Block
Let’s combine the things we have learned so far into a single code block so that every concept will be in one place.
# TensorFlow GradientTape on a floating variable x = tf.Variable(20.0, dtype=tf.float32) print(x) with tf.GradientTape() as tape: y = x**2 # dy = 2x * dx = 2*20.0 = 40.0 dy_dx = tape.gradient(y, x) print(f"TensorFlow GradientTape on a floating variable: {dy_dx}\n") # TensforFlow GradientTape on an integer value x = tf.Variable(30) print(x) with tf.GradientTape() as tape: y = x**3 # resulting `y` will be `tf.int32` as well print(y) dy_dx = tape.gradient(y, x) print(f"TensorFlow GradientTape on an integer variable: {dy_dx}\n") # GradientTape on a 1D tensor variable x = tf.Variable([1.0, 2.0, 3.0], dtype=tf.float32) with tf.GradientTape() as tape: y = x**3 print(y) dy_dx = tape.gradient(y, x) print(f"GradientTape on a 1D tensor variable: {dy_dx}\n") # GradientTape on a constant tensor x = tf.constant(3.0) with tf.GradientTape() as tape: # tell the GradientTape() API to watch the tensor `x` explicitly tape.watch(x) y = x**2 # dy = 2x * dx dy_dx = tape.gradient(y, x) print(f"GradientTape on a constant tensor: {dy_dx}\n") """ Controlling trainable variables """ x = tf.Variable(5.0, trainable=False) # GradientTape() does not watch non-trainable paramters/variables with tf.GradientTape() as tape: y = x**3 dy_dx = tape.gradient(y, x) print(f"GradientTape does not watch non-trainable variables: {dy_dx}") x = tf.Variable(5.0, trainable=False) # We need to watch non-trainable variables with tf.GradientTape() as tape: tape.watch(x) y = x**3 dy_dx = tape.gradient(y, x) print(f"We need to tell GradientTape to watch non-trainable variables explicitly: {dy_dx}\n\n") """ GradientTape on more than one variable with `persistent=True` """ x0 = tf.Variable(5.0) x1 = tf.Variable(10.0) # using `persistent=True` with tf.GradientTape(persistent=True) as tape: y0 = x0**2 y1 = x1**2 dy_dx0 = tape.gradient(y0, x0) dy_dx1 = tape.gradient(y1, x1) print(dy_dx0) print(dy_dx1)
This brings us to the end of the coding part of this tutorial.
Summary and Conclusion
In this tutorial, we covered the very basics of TensorFlow GradientTape API. We did not cover all, there are a lot more things to be explored in GradientTape. But this should act as a good starting point for anyone who is just starting to learn TensorFlow. I hope that you learned something new in this tutorial.
If you have any doubts, thoughts, or suggestions, please leave them in the comment section. I will surely address them.
You can contact me using the Contact section. You can also find me on LinkedIn, and Twitter.
Hello!
Excellent tutorial I must say. Can you possibly help me figure out how to do nested gradients with respect to multiple variables. I can’t figure out how to take a gradient of a gradient with respect to different variables.
Thanks for your time!
Hello Travis. Glad that you liked the tutorial.
So, are you looking for something like tf.GradientTape() inside another tf.GradientTape() context?