Question about the effect of discount factor and done mask when calculating the target value?

Open puyuan1996 opened this issue 3 years ago • 0 comments

Thanks for your open-sourced code very much.

This is a common definition of an target value in classical RL:

I'm a little confused about the way of calculating target value here in reanalyze_worker.py:

Why we do not multiply the bootstrap value (here is value_lst) by the discount_factor^td_steps, and why we do not mask the bootsrap value when the target obs is a done state.

Looking forward to your reply！

Dec 28 '22 09:12 puyuan1996