mlsh
mlsh copied to clipboard
Terminal states logic for sub-policies
https://github.com/openai/mlsh/blob/58f527ab7e3397eeb723a7309852b6d8791d5c24/mlsh_code/rollouts.py#L123
Hi, shouldn't the logic for determining terminal states for sub-policies consider the case where the master action changes? If the action changes, shouldn't we designate the current state as terminal? It seems that the current implementation can bootstrap from a different sub-policy network when such a case arises.
I agree with you. We probably shouldn't be rolling in the expectation of a different policy into a sub-policy advantage. They are practically different MDP's.