micrograd
micrograd copied to clipboard
An Alternative Approach: Recursive Backpropagation Without Topological Sorting
Very intuitive and simple, @karpathy! I was coding in parallel while watching your video and took a slightly different approach.
- For computing gradients of
+and*, I used the fundamental derivative formula:
$$ L = \lim_{h \to 0} \frac{f(a+h) - f(a)}{h} $$
- Instead of using topological sorting for backpropagation, I implemented a recursive approach, where each parent node checks its child nodes and calculates gradients accordingly. While this method is probably less efficient—as it can recompute gradients for child nodes multiple times when gradients flow from multiple paths—it still serves as a valid alternative that produces the same results.
Link to my repo: https://github.com/wahabaftab/micrograd/
You don't want to use recursion as you can run into stackoverflow errors for large networks. Stick to iteration,
def trace(root):
"""
Builds a set of a vertices and edges in the graph
Returns tuple (vertices, edges) = (vertices, ((vertex, vertex)))
Basically BFS with no destination in mind.
"""
stack = [root]
vertices = []
edges = []
while stack:
v = stack.pop()
if v not in vertices:
vertices.append(v)
for v_child in v._prev:
edges.append((v_child, v))
stack.append(v_child)
return vertices, edges