Pytorch-DPPO
Pytorch-DPPO copied to clipboard
Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286
env: torch 1.8.1+cu111 Error: UserWarning: Error detected in AddmmBackward. Traceback of forward call that caused the error: File "", line 1, in File "E:\A\envs\gym\lib\multiprocessing\spawn.py", line 105, in spawn_main exitcode =...
Usually PPO is for continous action, but for OpenAI FIVE, shouldn't the action be discrete? What's the technique to make PPO applicable to Dota2 actions?
Thanks for the nice implementation in pytorch, which made easier for me to learn. Regarding chief.py implementation, I got a question about updates to global weights. From Algorithm Pseudocode in...
after test your PPO, and compare with another , i think your advantages need to been : (advantages - advantages.mean()) / advantages.std() for you reference
It seems that you should clamp ratio, not surr1. https://github.com/alexis-jacq/Pytorch-DPPO/blob/master/ppo.py#L145