Grokking-Deep-Learning Chapter 4 - How is weight

Hello,

I have finished the Chapter 4. But I have a question regarding weight_delta. It has a value of delta * input. In the book it says that weight_delta is the derivative of the error, right ? On page 60 for example, the error is error = ((0.5*weight ) - 0.8) ** 2

When I give this error function to Wolfram Alpha, it gives me the derivative of 0.5 * x - 0.8 (where x = weight). So, in general, the derivative of error should be input * weight - goal_pred.

So, why they use delta * input for weight_delta if weight_delta is the derivative ???

Aug 17 '20 20:08 ghost

I think the derivative is 2 * ((0.5weight) - 0.8) * 0.5, that is 2 * 0.5 * ((0.5weight) - 0.8) , so the result is 0.5*x - 0.8

Aug 29 '20 03:08 batman47steam

the general forum for this : 2 * ((input*weight) - goal_pred) * input. in nerualnetwork people may dont care about the exactly coefficient of derivative, so just omit 2 and leave the key part of the derivative

Aug 29 '20 03:08 batman47steam

Hmm, if error = (input * weight - goal_pred) ** 2 should derivative not be 2 * (input * weight - goal_pred)? Since in this example input = 2, it's the same but I'm also confused...

Oct 23 '20 09:10 jpstrube

Hmm, if error = (input * weight - goal_pred) ** 2 should derivative not be 2 * (input * weight - goal_pred)? Since in this example input = 2, it's the same but I'm also confused...

the derivative should be 2 * weight * (input * weight-goal_pred),that's the chain rule, you also should do derivative to input * weight

Nov 07 '20 04:11 batman47steam

As far as I'm concerned, weights_delta in 4th Chapter are calculated via delta rule

Just to clarify: The Delta rule is an update rule for single layer NN. It makes use of Gradient Descent. Backpropagation is an update rule for multi layer NN based on Gradient Descent.

Oct 23 '21 14:10 1vash

But if we are using direction_and_amount = (pred - goal_pred) * input * 2 (e.g., not omitting the two), the model converges much faster?

Jun 30 '23 12:06 chirapok

Chapter 4 - How is weight_delta computed ?