coltra-rl
coltra-rl copied to clipboard
A modular implementation of PPO, and soon hopefully other algorithms.
running into issues in replicating your work, your help would be greatly appreciated
No point in switching formats back and forth, just put everything into a tensor
With MultiGymEnv, it's `agent&env=0`. I need to figure this out in general, and for unity specifically because their ?team=0 is weird
Jax seems pretty dope, and the realm of PyTorch RL libraries is somewhat saturated
Right now, there is a vague connection between (`Box(...)` and `Action(continuous=...)`), and between (`Discrete(...)` and `Action(discrete=...)`) In principle, all of these can support Dict action/observation spaces, but I don't want...
Right now it's "input_size", "num_actions" and "discrete". The first two are inconsistent in the style, need to make it more intuitive
Need to make sure that if an agent with a wrapper is saved on a GPU, it can be gracefully loaded on a CPU, and the other way around.
Currently the pytest tests use some arbitrary network architectures, environments etc. By using `pytest.mark.parametrize`, this can be expanded into much more reliable tests, particularly between different sizes an discreteness
Self-explanatory