ddpg Reacher-v1 not training

Hi, I have just tried running Reacher-v1 for 1000000 timesteps with default settings and it didn't learn anything (it just get stuck at -12 test reward), but it looks like you made it running with some settings, what were these settings ?

Dec 15 '16 18:12 amolchanov86

Hey,

sorry for the late reply! The most important setting which was reward normalization is actually hardcoded into filter_env.py for Reacher-v1. The other hyperparameters etc. should be fine. Have you tried multiple times? Are at least the two pendulum tasks working?

Cheers Simon

Dec 22 '16 03:12 rmst

Hi, thanks for the reply !

I tried only once. ok, I will rerun it. But the thing is I am experiencing the same problems with my implementation, although, all balancing envs and the hopper worked fine.
Another question: did you try to learn some high-dimensional tasks using ddpg?
And the last but not least: correct me if I am wrong, but you haven't tried prioritized experience replay, yet ? Because it is a bit confusing that PER is mentioned under "Improvements beyond the original paper", but from "replay_memory.py" it seems that replay buffer is just randomly sampled. Thanks a lot !

Dec 23 '16 01:12 amolchanov86

Hey, sry for the late reply.

I never got Reacher-v1 to "solve" but it was close (like you can see in the gif in the readme). For my evaluations I used the commit before "fixes in replay memory" but actually I don't believe the performance got worse after that commit. I don't use prioritized experience replay. The list of improvements are only a roadmap. I haven't had time to work on that so far and now it actually doesn't seem like such a big improvement compared to other things like auxiliary tasks in a3c and so on. Maybe I will release a new tensorflow deep RL repo though where we can include it.

Ah and no I didn't use it with convolutional nets on pixels yet. But that should also come soon (in the new repo though).

Cheers

Jan 06 '17 19:01 rmst

Hi thanks for the help !

Jan 08 '17 00:01 amolchanov86