async-rl
async-rl copied to clipboard
Why the value loss need to devide 2 in line 108 of a3c.py
v_loss += (v - R) ** 2 / 2
But the original paper just calculate the derivative of the (V-R)^2 right?
And you mentioned in https://github.com/muupan/async-rl/wiki They multiply the gradients of V by 0.5. So in the a3c.py there are the parameters (pi_loss_coef=1.0, v_loss_coef=0.5) But why there is another 0.5 in v_loss?