David Yanguas Rojas

Results 2 comments of David Yanguas Rojas

It is not hard to solve. I added the list casting to the lin_policy object (around line 23) and then it worked: ``` lin_policy = np.load(args.expert_policy_file) lin_policy = list(lin_policy.items()) lin_policy...

They explain that in the article... the idea is to supress the survival bonus from the reward function in order to avoid some local optima. In hopper the survival bonus...