[BUG]: ChatGPT: why total reward is reward = r - kl_coef * kl, not total_reward=r + gamma * critic(next_states)?

Open liukaiyueyuo opened this issue 2 years ago • 1 comments

ChatGPT: why total reward is reward = r - kl_coef * kl, not total_reward=r + gamma * critic(next_states)?

No response

Feb 21 '23 03:02 liukaiyueyuo

Because as we think, the rl training process here is a one-step process, which means there isn't a next_state.

Feb 21 '23 03:02 ht-zhou

I'll close this issue now, please reopen the issue if you have further questions.

Feb 22 '23 02:02 ht-zhou