policy-gradient-methods icon indicating copy to clipboard operation
policy-gradient-methods copied to clipboard

Modular PyTorch implementation of policy gradient methods

Results 4 policy-gradient-methods issues
Sort by recently updated
recently updated
newest added

Using sampled entropy rather than analytic entropy.

According to fig 3 here https://arxiv.org/abs/1707.06347 VPG should be able to do ok after about 0.5M steps

Seems like discrete spaces with discrete actions is not doing well at all. Might be a wrapper problem