Asynchronous-Methods-for-Deep-Reinforcement-Learning
Asynchronous-Methods-for-Deep-Reinforcement-Learning copied to clipboard
Different final epsilons from the paper
The paper states that the final epsilons should be [0.1, 0.01, 0.5]. But I noticed in your code they are [0.01, 0.01, 0.05] (Strangely there are two 0.01s). Is this a mistake or intentional improvement?
I'm tunning the model myself, while I'm not sure which hyper parameters are important.