Terminal states logic for sub-policies

Open ysaibhargav opened this issue 7 years ago • 1 comments

https://github.com/openai/mlsh/blob/58f527ab7e3397eeb723a7309852b6d8791d5c24/mlsh_code/rollouts.py#L123

Hi, shouldn't the logic for determining terminal states for sub-policies consider the case where the master action changes? If the action changes, shouldn't we designate the current state as terminal? It seems that the current implementation can bootstrap from a different sub-policy network when such a case arises.

Mar 17 '18 00:03 ysaibhargav

I agree with you. We probably shouldn't be rolling in the expectation of a different policy into a sub-policy advantage. They are practically different MDP's.

Apr 21 '18 22:04 AaronHavens