Aaron Havens comments

Repositories
Issues
Comments

Results 2 comments of


                                            Aaron Havens

Observation of MovementBandits env

@natsuki14 It is the meta-learning objective of the master policy to choose the best sub-policy to reach the current goal. The correct goal remains a latent condition of the MDP....

Terminal states logic for sub-policies

I agree with you. We probably shouldn't be rolling in the expectation of a different policy into a sub-policy advantage. They are practically different MDP's.