torchrl
torchrl copied to clipboard
Highly Modular and Scalable Reinforcement Learning
Allow ONNX export of computational graphs so that they can be used in places beyond the repo.
Paper: https://arxiv.org/abs/1801.01290 Blog Post: https://bair.berkeley.edu/blog/2018/12/14/sac/
Storing trajectories off-policy is helpful for algorithms which learn off policy. This may or may not be needed.
Most new algorithms will require a pre-built tuning framework.
This can be achieved with `Horovod`. As long as we respect the MPI environment variables, it should be fairly simple (hopefully!) to port existing code to support distributed training.
Current list of experiments satisfy a POC. Need to support experiments on more complicated environments to make sure future experiments can be done faster. As a first, could do this...