Rujikorn Charakorn comments

Results 25 comments of


                                            Rujikorn Charakorn

Proper entropy regularized PPO

@vwxyzjn Sure. I'll report back after I ran the tests.

Proper entropy regularized PPO

@vwxyzjn I have tested the proper version. The results look very good on Pybullet's HalfCheetah (hovering around 2200 after 1M steps which is higher than the current version of PPO)...

Proper entropy regularized PPO

@vwxyzjn I did not turn wandb tracking on. I'll do that tonight and send you the report link right after. And the PR should be simple enough. Should we try...

Proper entropy regularized PPO

@vwxyzjn Sorry for a late reply. It seems like the improvement I reported is just a noise :( It seems like the continuous control tasks do not benefit from using...

Proper entropy regularized PPO

And the tracked stats is here: https://wandb.ai/51616/proper_ppo_entropy?workspace=user-51616

Policy network penalized incorrectly on invalid moves

I followed your fix and got small negative pi loss (around -0.00xx). Is this normal ? edit. now i'm using alternative code and it produces positive pi loss.

Policy network penalized incorrectly on invalid moves

@jl1990 i use this equation instead. `pi -= (1-valids)*1000 pi = log_softmax(pi)` this should produce positive pi loss.

Use multiprocess to speed up training and playing.

Is there any chance I can use this during training?

Use multiprocess to speed up training and playing.

Cool! @gigayaya That would be amazing since I think self-play is the bottleneck of this training loop. How much faster is it if you do self-play in parallel? Is is...

Use multiprocess to speed up training and playing.

I would love to see the implementation, of course. @gigayaya You can just commit to this PR and I can read the code just fine. 👍 Also, Have you heard...