EfficientZero icon indicating copy to clipboard operation
EfficientZero copied to clipboard

Question about the effect of discount factor and done mask when calculating the target value?

Open puyuan1996 opened this issue 3 years ago • 0 comments

Thanks for your open-sourced code very much.

This is a common definition of an target value in classical RL: image

I'm a little confused about the way of calculating target value here in reanalyze_worker.py:

Why we do not multiply the bootstrap value (here is value_lst) by the discount_factor^td_steps, and why we do not mask the bootsrap value when the target obs is a done state.

Looking forward to your reply!

puyuan1996 avatar Dec 28 '22 09:12 puyuan1996