population-irl
population-irl copied to clipboard
(Experimental) Inverse reinforcement learning from trajectories generated by multiple agents with different (but correlated) rewards
It currently takes around 2 seconds to load the pirl module. Most of the time this isn't a big deal, but due to a combination of: 1. We restart each...
New requirements: 1. Make the lava not be giant columns 2. Create a parameter for different types of tiles with reward distributions (e.g. "Water" in the range "-1 to -5",...
Should be based on this: https://github.com/openai/gym/blob/master/gym/envs/mujoco/ant.py I think we probably just need to change the reward function part of that code. So either just copying that, or overwriting the step...
It's pretty slow right now, I suspect the value iteration is slow. Good to do some profiling to pin it down. Possible that moving things to GPU would speed things...