Back-Propagation Spelled Out - As Explained by Karpathy
Hi there! I'm Shrijith Venkatrama, founder of Hexmos. Right now, I’m building LiveAPI, a tool that makes generating API docs from your code ridiculously easy. Adding Labels To Improve Graph Readability Add label parameter to Value class: class Value: def __init__(self, data, _children=(), _op='', label=''): self.data = data self._prev = set(_children) self._op = _op self.label = label def __repr__(self): return f"Value(data={self.data})" def __add__(self, other): return Value(self.data + other.data, (self, other), '+') def __mul__(self, other): return Value(self.data * other.data, (self, other), '-') a = Value(2.0, label='a') b = Value(-3.0, label='b') c = Value(10, label='c') e = a * b; e.label = 'e' d = e + c; d.label = 'd' print(d._prev) print(d._op) print("---") print(e._prev) print(e._op) Update draw_dot to include the label in the graph Originally we had the node expression as: dot.node(name=uid, label="{ data %.4f }" % (n.data,), shape='record') Replace with: dot.node(name=uid, label="{ %s | data %.4f }" % (n.label, n.data), shape='record') Now draw_dot(d) returns: Re-Render graph with Labels Let's add a few nodes - f and L to the expression a = Value(2.0, label='a') b = Value(-3.0, label='b') c = Value(10, label='c') e = a * b; e.label = 'e' d = e + c; d.label = 'd' f = Value(-2.0, label='f') L = d * f; L.label = 'L' L Generate graph: draw_dot(L) This graph we've built above is the forward-pass of laying out the nodes. What We Want to Calculate We want to know how the inputs (weights - a,b,c,d,e,f) affect the output (the loss function L). So - we want to find: dL/dL, dL/df, dL/de, dL/dd, dL/dc, dL/db, dL/da. Add the grad parameter to accommodate backpropogation class Value: def __init__(self, data, _children=(), _op='', label=''): self.data = data self._prev = set(_children) self._op = _op self.label = label self.grad = 0.0 # 0 means no impact on output to start with Update the node graphics information dot.node(name=uid, label="{ %s | data %.4f | grad %.4f }" % (n.label, n.data, n.grad), shape='record') Manually Performing Back-Propagation for The Given Graph Node L What is dL/dL - that is if we change L by a tiny amount, how will it affect the output L? The answer is obviously - 1. That is, L.grad = 1 The Expression a = Value(2.0, label='a') b = Value(-3.0, label='b') c = Value(10, label='c') e = a * b; e.label = 'e' d = e + c; d.label = 'd' f = Value(-2.0, label='f') L = d * f; L.label = 'L' L Node d L = d * f By known rules: dL/dd = f By derivation: dL/dd = (f(x+h) - f(x))/h = (d*f + h*f - d*f)/h = h*f/h = f That is, dL/dd = f = -2.0 So, we do d.grad = -2.0 Node f By symmetry, we get that dL/df = d = 4.0 That is, f.grad = 4.0 The new updated graph is like this: How to do Numerical Verification of the Derivatives def verify_dL_by_df(): h = 0.001 a = Value(2.0, label='a') b = Value(-3.0, label='b') c = Value(10, label='c') e = a * b; e.label = 'e' d = e + c; d.label = 'd' f = Value(-2.0, label='f') L = d * f; L.label = 'L' L1 = L.data a = Value(2.0, label='a') b = Value(-3.0, label='b') c = Value(10, label='c') e = a * b; e.label = 'e' d = e + c; d.label = 'd' f = Value(-2.0 + h, label='f') # bumb f a little bit L = d * f; L.label = 'L' L2 = L.data print((L2 - L1)/h) verify_dL_by_df() # prints out 3.9999 ~ 4 The Challenge - How do we calculate dL/dc? We know dL/dd = -2.0 - so we know how L is affected by d. The question is how is c going to impact L through d. First, we can calculate the "local derivative", or figure out how c impacts d first. That is, dd/dc = ? We know that: d = c + e So once we differentiate by c, we get: dd/dc = 1 Similarly, dd/de = 1. Now the question is, how to put together dd/dc and dL/dd? We need something called the Chain Rule: So, applying chain rule, we get: dL/dc = dL/dd * dd/dc dL/dc = -2.0 * 1.0 = -2.0 Similarly, dL/de = -2.0 Let's set the values in python, and redraw the graph now: c.grad = -2.0 e.grad = -2.0 Figuring out dL/da and dL/db We know: dL/de = -2.0 We want to know: dL/da = dL/de * de/da We know that: e = a * b de/da = b de/da = b = -3.0 We can also find: e = a * b de/db = a de/db = a = 2.0 So, now to get what we need: dL/da = dL/de * de/da = -2.0 * -3.0 = 6.0 dL/db = dL/de * de/db = -2.0 * 2.0 = -4.0 We set the values in python, and redraw to get the full graph: a.grad = 6.0 b.grad = -4.0 Reference

Hi there! I'm Shrijith Venkatrama, founder of Hexmos. Right now, I’m building LiveAPI, a tool that makes generating API docs from your code ridiculously easy.
Adding Labels To Improve Graph Readability
Add label
parameter to Value
class:
class Value:
def __init__(self, data, _children=(), _op='', label=''):
self.data = data
self._prev = set(_children)
self._op = _op
self.label = label
def __repr__(self):
return f"Value(data={self.data})"
def __add__(self, other):
return Value(self.data + other.data, (self, other), '+')
def __mul__(self, other):
return Value(self.data * other.data, (self, other), '-')
a = Value(2.0, label='a')
b = Value(-3.0, label='b')
c = Value(10, label='c')
e = a * b; e.label = 'e'
d = e + c; d.label = 'd'
print(d._prev)
print(d._op)
print("---")
print(e._prev)
print(e._op)
Update draw_dot
to include the label in the graph
Originally we had the node expression as:
dot.node(name=uid, label="{ data %.4f }" % (n.data,), shape='record')
Replace with:
dot.node(name=uid, label="{ %s | data %.4f }" % (n.label, n.data), shape='record')
Now draw_dot(d)
returns:
Re-Render graph with Labels
Let's add a few nodes - f
and L
to the expression
a = Value(2.0, label='a')
b = Value(-3.0, label='b')
c = Value(10, label='c')
e = a * b; e.label = 'e'
d = e + c; d.label = 'd'
f = Value(-2.0, label='f')
L = d * f; L.label = 'L'
L
Generate graph:
draw_dot(L)
This graph we've built above is the forward-pass of laying out the nodes.
What We Want to Calculate
We want to know how the inputs (weights - a,b,c,d,e,f
) affect the output (the loss function L
). So - we want to find: dL/dL
, dL/df
, dL/de
, dL/dd
, dL/dc
, dL/db
, dL/da
.
Add the grad
parameter to accommodate backpropogation
class Value:
def __init__(self, data, _children=(), _op='', label=''):
self.data = data
self._prev = set(_children)
self._op = _op
self.label = label
self.grad = 0.0 # 0 means no impact on output to start with
Update the node graphics information
dot.node(name=uid, label="{ %s | data %.4f | grad %.4f }" % (n.label, n.data, n.grad), shape='record')
Manually Performing Back-Propagation for The Given Graph
Node L
What is dL/dL
- that is if we change L
by a tiny amount, how will it affect the output L
? The answer is obviously - 1
.
That is,
L.grad = 1
The Expression
a = Value(2.0, label='a')
b = Value(-3.0, label='b')
c = Value(10, label='c')
e = a * b; e.label = 'e'
d = e + c; d.label = 'd'
f = Value(-2.0, label='f')
L = d * f; L.label = 'L'
L
Node d
L = d * f
By known rules:
dL/dd = f
By derivation:
dL/dd =
(f(x+h) - f(x))/h =
(d*f + h*f - d*f)/h =
h*f/h =
f
That is, dL/dd = f = -2.0
So, we do
d.grad = -2.0
Node f
By symmetry, we get that dL/df = d = 4.0
That is,
f.grad = 4.0
The new updated graph is like this:
How to do Numerical Verification of the Derivatives
def verify_dL_by_df():
h = 0.001
a = Value(2.0, label='a')
b = Value(-3.0, label='b')
c = Value(10, label='c')
e = a * b; e.label = 'e'
d = e + c; d.label = 'd'
f = Value(-2.0, label='f')
L = d * f; L.label = 'L'
L1 = L.data
a = Value(2.0, label='a')
b = Value(-3.0, label='b')
c = Value(10, label='c')
e = a * b; e.label = 'e'
d = e + c; d.label = 'd'
f = Value(-2.0 + h, label='f') # bumb f a little bit
L = d * f; L.label = 'L'
L2 = L.data
print((L2 - L1)/h)
verify_dL_by_df() # prints out 3.9999 ~ 4
The Challenge - How do we calculate dL/dc
?
We know dL/dd = -2.0
- so we know how L
is affected by d
.
The question is how is c
going to impact L
through d
.
First, we can calculate the "local derivative", or figure out how c
impacts d
first.
That is,
dd/dc = ?
We know that:
d = c + e
So once we differentiate by c
, we get: dd/dc = 1
Similarly, dd/de = 1
.
Now the question is, how to put together dd/dc
and dL/dd
?
We need something called the Chain Rule:
So, applying chain rule, we get:
dL/dc = dL/dd * dd/dc
dL/dc = -2.0 * 1.0 = -2.0
Similarly, dL/de = -2.0
Let's set the values in python, and redraw the graph now:
c.grad = -2.0
e.grad = -2.0
Figuring out dL/da and dL/db
We know:
dL/de = -2.0
We want to know:
dL/da = dL/de * de/da
We know that:
e = a * b
de/da = b
de/da = b = -3.0
We can also find:
e = a * b
de/db = a
de/db = a = 2.0
So, now to get what we need:
dL/da = dL/de * de/da = -2.0 * -3.0 = 6.0
dL/db = dL/de * de/db = -2.0 * 2.0 = -4.0
We set the values in python, and redraw to get the full graph:
a.grad = 6.0
b.grad = -4.0