GaussianPolicy output should use a tanh() activation?

Open endseeker opened this issue 3 years ago • 0 comments

i was able to get close to/better than official results (i also made the cosine damped learning rate to work in 5000 steps)

Jul 04 '22 16:07 endseeker