Timo Klein

Results 24 comments of Timo Klein

> I just glanced at the paper and would recommend prototyping a sac_atari.py to work with Atari games as done in the paper. Getting to work on it! > I...

The results of the original paper are in https://github.com/vwxyzjn/cleanrl/issues/266. There's quite a number of games where its performance at 100k doesn't differ much from a random agent, e.g. Frostbite. I...

Posting an update: I'm running 1m step experiments on Seaquest (takes a while) currently with two versions of the algorithm: One with an implementation as close as possible to cleanrl's...

I ran some experiments on `MsPacman` and `Seaquest`. [Here's](https://wandb.ai/timo_kk/SAC-discrete/reports/SAC-discrete--VmlldzoyNjY0MzY3?accessToken=csgyf6higme5b9mfye95nm0ux3fu4q5jguw0c00u974lupw9p7ujkyhnqhvs1uy7) a link to a report with some results. The entropy regularization coefficient $\alpha$ has a tendency to explode when training longer...

> @timoklein Sometimes target entropy maybe just very high and hard to reach and the loss can explode (as alpha will grow and grow), so usually I tune a bit...

Tried a small set of entropy scaling factors (0.7, 0.8, 0.9, 0.98) and clearly the value of the original paper (0.98) seems rather suboptimal. See the report [here](https://wandb.ai/timo_kk/SAC-discrete/reports/Effect-of-entropy-scale--VmlldzoyNzQ4MDY4?accessToken=0w4q1v5w4ytzfjmbjowmvdxwglf75tvi6ivgk7i02ypssyhh0vxiise0quoithq3). Going forward...

For me, this feature would also be very useful to copy baselines between different projects.

Also from my side: Please implement this, it's important when you're e.g. changing universities.

Thank you very much for the detailed feedback. I'm going work everything in. The contribution's going a bit slow because I also have other stuff to do but the process...

> Hi @timoklein ! This is awesome work! I was wondering, have you tested this on a non-Atari discrete environment? I was attempting to run this on discrete Lunar Lander...