Learning-from-data
Learning-from-data copied to clipboard
Exercise 7.8 question
Hi.
Why you assume that $\theta'(s^{(l)}_j)=1$ for all $\delta^{(2)}$ and $\delta^{(1)}$? We have identity output only for the final layer, all previous layers use $tanh(x)$ transformation. So, we calculate $\delta^{(2)}$ and $\delta^{(1)}$ like in example 7.1 given new value $\delta^{(3)}$, namely $\delta^{(i)} = \theta'(s^{(i)}) \otimes \left[W^{(i+1)}\delta^{(i+1)}\right]$ for $i=0,1$ where $\theta(s^{(i)}) = tanh(s^{(i)})$