Grokking-Deep-Learning icon indicating copy to clipboard operation
Grokking-Deep-Learning copied to clipboard

Chapter 4 - How is weight_delta computed ?

Open ghost opened this issue 5 years ago • 6 comments

Hello,

I have finished the Chapter 4. But I have a question regarding weight_delta. It has a value of delta * input. In the book it says that weight_delta is the derivative of the error, right ? On page 60 for example, the error is error = ((0.5*weight ) - 0.8) ** 2

When I give this error function to Wolfram Alpha, it gives me the derivative of 0.5 * x - 0.8 (where x = weight). So, in general, the derivative of error should be input * weight - goal_pred.

So, why they use delta * input for weight_delta if weight_delta is the derivative ???

ghost avatar Aug 17 '20 20:08 ghost

I think the derivative is 2 * ((0.5weight) - 0.8) * 0.5, that is 2 * 0.5 * ((0.5weight) - 0.8) , so the result is 0.5*x - 0.8

batman47steam avatar Aug 29 '20 03:08 batman47steam

the general forum for this : 2 * ((input*weight) - goal_pred) * input. in nerualnetwork people may dont care about the exactly coefficient of derivative, so just omit 2 and leave the key part of the derivative

batman47steam avatar Aug 29 '20 03:08 batman47steam

Hmm, if error = (input * weight - goal_pred) ** 2 should derivative not be 2 * (input * weight - goal_pred)? Since in this example input = 2, it's the same but I'm also confused...

jpstrube avatar Oct 23 '20 09:10 jpstrube

Hmm, if error = (input * weight - goal_pred) ** 2 should derivative not be 2 * (input * weight - goal_pred)? Since in this example input = 2, it's the same but I'm also confused...

the derivative should be 2 * weight * (input * weight-goal_pred),that's the chain rule, you also should do derivative to input * weight image

batman47steam avatar Nov 07 '20 04:11 batman47steam

As far as I'm concerned, weights_delta in 4th Chapter are calculated via delta rule

Just to clarify: The Delta rule is an update rule for single layer NN. It makes use of Gradient Descent. Backpropagation is an update rule for multi layer NN based on Gradient Descent.

1vash avatar Oct 23 '21 14:10 1vash

But if we are using direction_and_amount = (pred - goal_pred) * input * 2 (e.g., not omitting the two), the model converges much faster?

chirapok avatar Jun 30 '23 12:06 chirapok