Learning-from-data icon indicating copy to clipboard operation
Learning-from-data copied to clipboard

Exercise 7.8 question

Open sharov-am opened this issue 9 months ago • 0 comments

Hi.

Why you assume that $\theta'(s^{(l)}_j)=1$ for all $\delta^{(2)}$ and $\delta^{(1)}$? We have identity output only for the final layer, all previous layers use $tanh(x)$ transformation. So, we calculate $\delta^{(2)}$ and $\delta^{(1)}$ like in example 7.1 given new value $\delta^{(3)}$, namely $\delta^{(i)} = \theta'(s^{(i)}) \otimes \left[W^{(i+1)}\delta^{(i+1)}\right]$ for $i=0,1$ where $\theta(s^{(i)}) = tanh(s^{(i)})$

sharov-am avatar Apr 20 '25 12:04 sharov-am