IQL-PyTorch
IQL-PyTorch copied to clipboard
A PyTorch implementation of Implicit Q-Learning
In the function return_range, the end of trajectory is marked by either the terminal signal or the time steps equals to max_episode_steps. However, as the dataset is extracted from D4RL's...
 i was able to get close to/better than official results (i also made the cosine damped learning rate to work in 5000 steps)
Thanks for your work. Could your code behave well in Antmaze environments?