IQL-PyTorch
IQL-PyTorch copied to clipboard
GaussianPolicy output should use a tanh() activation?
i was able to get close to/better than official results (i also made the cosine damped learning rate to work in 5000 steps)