Modeling a Neuron in micrograd (As Explained by Karpathy)
Hi there! I'm Shrijith Venkatrama, founder of Hexmos. Right now, I’m building LiveAPI, a tool that makes generating API docs from your code ridiculously easy. Modeling a Neuron In serious neural network implementations, we model the neuron in the following way: Input x0 (axon) Weight w0 (synapse) 1 "Influence" x0*w0 (dendrite) Sum of "influences" = x0*w0 + x1*w1 + ... (cell body) Bias b The above leads to the cell body expression: ∑(xi⋅wi)+b\sum (x_i \cdot w_i) + b ∑(xi⋅wi)+b We also have: Activation function - squashing fuction (tanh, sigmoid) The output axon is then: f(∑(xi⋅wi)+b)f(\sum (x_i \cdot w_i) + b) f(∑(xi⋅wi)+b) Representing the Model Neuron (defined above) in micrograd # inputs x1, x2 x1 = Value(2.0, label='x1') x2 = Value(0.0, label='x2') # weights w1, w2 w1 = Value(-3.0, label='w1') w2 = Value(1.0, label='w2') # bias of the neuron b = Value(6.7, label='b') x1w1 = x1 * w1; x1w1.label = 'x1*w1' x2w2 = x2 * w2; x2w2.label = 'x2*w2' x1w1x2w2 = x1w1 + x2w2; x1w1x2w2.label = 'x1*w1 + x2*w2' n = x1w1x2w2 + b; n.label = 'n' draw_dot(n) Result: Implementing tanh into Value (for the Activation Function) We have the following tanh formula: We can implement the function as follows: class Value: ... def tanh(self): x = self.data t = (math.exp(2*x) - 1) / (math.exp(2*x) + 1) out = Value(t, (self, ), 'tanh') return out We'll add a new node o which is the tanh(n): # inputs x1, x2 x1 = Value(2.0, label='x1') x2 = Value(0.0, label='x2') # weights w1, w2 w1 = Value(-3.0, label='w1') w2 = Value(1.0, label='w2') # bias of the neuron b = Value(6.8813735870195432, label='b') x1w1 = x1 * w1; x1w1.label = 'x1*w1' x2w2 = x2 * w2; x2w2.label = 'x2*w2' x1w1x2w2 = x1w1 + x2w2; x1w1x2w2.label = 'x1*w1 + x2*w2' n = x1w1x2w2 + b; n.label = 'n' o = n.tanh(); o.label = 'o' draw_dot(o) And we get: Derivative of o - Derivative of tanh The formula for derivative of tanh is the following: So, we want to find out do/dn: do/dn = 1 - tanh(n)**2 = 1 - o**2 We know that do/do = 1 So, o.grad = 1 To find do/dn, we do: Therefore: n.grad = 0.5 Getting all the backprop values calculated (manually) We leverage some patterns we've learned previously about how backprop works with addition/multiplication, to quickly fill in the values for grad in each node: o.grad = 1 n.grad = 1 - o.data**2 ## addition - grad just flows through to previous stages x1w1x2w2.grad = n.grad b.grad = n.grad x2w2.grad = x1w1x2w2.grad x1w1.grad = x1w1x2w2.grad ## multiplication - element.grad = sibling.data * next.grad x2.grad = w2.data * x2w2.grad w2.grad = x2.data * x2w2.grad x1.grad = w1.data * x1w1.grad w1.grad = x1.data * x1w1.grad draw_dot(o) Result: Reference The spelled-out intro to neural networks and backpropagation: building micrograd - YouTube

Hi there! I'm Shrijith Venkatrama, founder of Hexmos. Right now, I’m building LiveAPI, a tool that makes generating API docs from your code ridiculously easy.
Modeling a Neuron
In serious neural network implementations, we model the neuron in the following way:
- Input
x0
(axon) - Weight
w0
(synapse) - 1 "Influence"
x0*w0
(dendrite) - Sum of "influences" =
x0*w0 + x1*w1 + ...
(cell body) - Bias
b
The above leads to the cell body expression:
∑(xi⋅wi)+b\sum (x_i \cdot w_i) + b ∑(xi⋅wi)+b
We also have:
- Activation function - squashing fuction (
tanh
,sigmoid
)
- The output axon is then:
Representing the Model Neuron (defined above) in micrograd
# inputs x1, x2
x1 = Value(2.0, label='x1')
x2 = Value(0.0, label='x2')
# weights w1, w2
w1 = Value(-3.0, label='w1')
w2 = Value(1.0, label='w2')
# bias of the neuron
b = Value(6.7, label='b')
x1w1 = x1 * w1; x1w1.label = 'x1*w1'
x2w2 = x2 * w2; x2w2.label = 'x2*w2'
x1w1x2w2 = x1w1 + x2w2; x1w1x2w2.label = 'x1*w1 + x2*w2'
n = x1w1x2w2 + b; n.label = 'n'
draw_dot(n)
Result:
Implementing tanh
into Value (for the Activation Function)
We have the following tanh
formula:
We can implement the function as follows:
class Value:
...
def tanh(self):
x = self.data
t = (math.exp(2*x) - 1) / (math.exp(2*x) + 1)
out = Value(t, (self, ), 'tanh')
return out
We'll add a new node o
which is the tanh(n)
:
# inputs x1, x2
x1 = Value(2.0, label='x1')
x2 = Value(0.0, label='x2')
# weights w1, w2
w1 = Value(-3.0, label='w1')
w2 = Value(1.0, label='w2')
# bias of the neuron
b = Value(6.8813735870195432, label='b')
x1w1 = x1 * w1; x1w1.label = 'x1*w1'
x2w2 = x2 * w2; x2w2.label = 'x2*w2'
x1w1x2w2 = x1w1 + x2w2; x1w1x2w2.label = 'x1*w1 + x2*w2'
n = x1w1x2w2 + b; n.label = 'n'
o = n.tanh(); o.label = 'o'
draw_dot(o)
And we get:
Derivative of o - Derivative of tanh
The formula for derivative of tanh
is the following:
So, we want to find out do/dn
:
do/dn = 1 - tanh(n)**2 = 1 - o**2
We know that do/do = 1
So, o.grad = 1
To find do/dn
, we do:
Therefore:
n.grad = 0.5
Getting all the backprop values calculated (manually)
We leverage some patterns we've learned previously about how backprop works with addition/multiplication, to quickly fill in the values for grad
in each node:
o.grad = 1
n.grad = 1 - o.data**2
## addition - grad just flows through to previous stages
x1w1x2w2.grad = n.grad
b.grad = n.grad
x2w2.grad = x1w1x2w2.grad
x1w1.grad = x1w1x2w2.grad
## multiplication - element.grad = sibling.data * next.grad
x2.grad = w2.data * x2w2.grad
w2.grad = x2.data * x2w2.grad
x1.grad = w1.data * x1w1.grad
w1.grad = x1.data * x1w1.grad
draw_dot(o)
Result:
Reference
The spelled-out intro to neural networks and backpropagation: building micrograd - YouTube