Add Retrace and a QNetwork abstraction
Following our discussion of yesterday at #613, I'm creating this draft PR to show how I went on implementing the Retrace Algorithm. More precisely, I wanted to implement Retrace as a plug-in that can optionally be used by an algorithm (I have #604 in mind mainly) but such a level of abstraction was not implemented yet. So here's my attempt. I assume that this may not be what you have in mind @findmyway but it can be a useful discussion even if this is never merged.
The main idea is that Retrace is just a different way than TD(n) to compute update targets for a QNetwork. So I created a QNetwork abstraction that can be called to be updated given a batch of action, states and targets : update!(qnetwork::AbstractQNetwork, states, actions, targets).
The main goal here was the introduction of the function q_targets, called by update!.
Now, why did I do this? Because then I could implement retrace in way that is reusable by ab algorithm that uses the QNetwork abstraction. RetraceTrajectory is an extension of a Trajectory, using a new type allows to overload q_targets. So an algorithm that uses a RetraceTrajectory will automatically use this target to update its AbstractQNetwork.
I'll leave this PR as a draft and only use it locally for now. When we move to 0.11 I'll adapt to make Retrace work with it.
I like where this is going. Would it make sense to make something like QNetworkWithTarget <: AbstractQNetwork to abstract away the target network and soft updates (not necessarily in this PR, just in general)? Also, can you please add a reference to the Retrace paper in a comment somewhere? I believe I read this paper a while ago, but it would be nice to have a reference for convenience.
Would it make sense to make something like QNetworkWithTarget <: AbstractQNetwork to abstract away the target network and soft updates
Yes, I think it's a choice that would make sense. Now that I think of it, it would make more sense to go this way and even define two QNetworkWithTarget, one that uses polyak averaging and one that copies the weights every k updates, since both approaches are common.
For the ref, you can find the paper here. Vtrace and Impala would be nice additions in the future (https://arxiv.org/abs/1802.01561) too. These pertain a lot to distributed RL that I think is about to undergo major changes in this package soon.
Thanks! This is very helpful!