Deep_reinforcement_learning_Course
Deep_reinforcement_learning_Course copied to clipboard
N-step returns
Do these algorithms compute n-step returns for the reward propagation? The Sonic A2C code looks like it just does 1 step returns V(S) = R(S) + V(S_next), except it's hard to tell because I'm not too familiar with GAE.