Use NN with better sequential modeling ability

Open crizCraig opened this issue 5 years ago • 0 comments

We currently use an MLP for the actor and critic networks in PPO. This should be fine so long as the environment is fully observable, but presents scaling issues when dealing with variable numbers of agents in the scene. See this TODO for context.

Also, we eventually will need to introduce partial observability of other agents to simulate the occlusion that occurs in real-world vehicles - so having a NN capable of some type of memory / sequential modeling will be necessary eventually. OpenAI used LSTMs with PPO for Dota2, so there may be some info on doing that. Also transformers seemingly handle long term dependencies and sequential modeling more efficiently, and have been successfully applied to RL with Deepmind's GTrXL work.

Note: We should move to Pytorch first to make NN modifications much more tractable, fun, sane, pleasurable, easy, fast etc...

Feb 10 '20 20:02 crizCraig