micrograd icon indicating copy to clipboard operation
micrograd copied to clipboard

An Alternative Approach: Recursive Backpropagation Without Topological Sorting

Open wahabaftab opened this issue 1 year ago • 1 comments

Very intuitive and simple, @karpathy! I was coding in parallel while watching your video and took a slightly different approach.

  • For computing gradients of + and *, I used the fundamental derivative formula:

$$ L = \lim_{h \to 0} \frac{f(a+h) - f(a)}{h} $$

  • Instead of using topological sorting for backpropagation, I implemented a recursive approach, where each parent node checks its child nodes and calculates gradients accordingly. While this method is probably less efficient—as it can recompute gradients for child nodes multiple times when gradients flow from multiple paths—it still serves as a valid alternative that produces the same results.

Link to my repo: https://github.com/wahabaftab/micrograd/

wahabaftab avatar Mar 25 '25 21:03 wahabaftab

You don't want to use recursion as you can run into stackoverflow errors for large networks. Stick to iteration,

def trace(root):
    """ 
    Builds a set of a vertices and edges in the graph
    Returns tuple (vertices, edges) = (vertices, ((vertex, vertex)))

    Basically BFS with no destination in mind. 
    """
    stack = [root]
    vertices = []
    edges = []

    while stack:
        v = stack.pop()
        if v not in vertices:
            vertices.append(v)
            for v_child in v._prev:
                edges.append((v_child, v))
                stack.append(v_child)

    return vertices, edges

dhern023 avatar May 26 '25 21:05 dhern023