Pytorch-DPPO issues

one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [100, 1]], which is output 0 of TBackward, is at version 3; expected version 2 instead.

2

env: torch 1.8.1+cu111 Error: UserWarning: Error detected in AddmmBackward. Traceback of forward call that caused the error: File "", line 1, in File "E:\A\envs\gym\lib\multiprocessing\spawn.py", line 105, in spawn_main exitcode =...

TJ2333

Question on algorithm itself

2

Usually PPO is for continous action, but for OpenAI FIVE, shouldn't the action be discrete? What's the technique to make PPO applicable to Dota2 actions?

QiXuanWang

average gradients to update global theta?

8

Thanks for the nice implementation in pytorch, which made easier for me to learn. Regarding chief.py implementation, I got a question about updates to global weights. From Algorithm Pseudocode in...

weicheng113

on advantages

1

after test your PPO, and compare with another , i think your advantages need to been : (advantages - advantages.mean()) / advantages.std() for you reference

cn3c3p

clamp ratio

1

It seems that you should clamp ratio, not surr1. https://github.com/alexis-jacq/Pytorch-DPPO/blob/master/ppo.py#L145

cswhjiang

Pytorch-DPPO
Pytorch-DPPO copied to clipboard

Metadata

one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [100, 1]], which is output 0 of TBackward, is at version 3; expected version 2 instead.

Question on algorithm itself

average gradients to update global theta?

on advantages

clamp ratio

← Metadata

Owner

Metadata

Pytorch-DPPO Pytorch-DPPO copied to clipboard

Metadata

one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [100, 1]], which is output 0 of TBackward, is at version 3; expected version 2 instead.

Question on algorithm itself

average gradients to update global theta?

on advantages

clamp ratio

← Metadata

Owner

Metadata

Pytorch-DPPO
Pytorch-DPPO copied to clipboard