Henri Dehaybe
Henri Dehaybe
As I was testing MPO on the cartpole environment, I noticed the algorithm was pretty unstable and has trouble stabilizing at the 200 returns policy. I eventually thought about the...
I'm opening this as a draft so discussions are possible early. This implements the MPO algorithm from [this paper](https://arxiv.org/abs/1806.06920) and [its improved version](https://arxiv.org/abs/1812.02256) PR Checklist - [ ] Update NEWS.md?...
Following our discussion of yesterday at #613, I'm creating this draft PR to show how I went on implementing the Retrace Algorithm. More precisely, I wanted to implement Retrace as...
```julia τs = Float32.(collect(1:99)./(100)) xs = Float32.(rand(Uniform(0,100), 1, 64)) dists = Gamma.(xs, 2*xs) quantile.(dists, τs) ``` throws ``` ERROR: MethodError: no method matching gammainvcdf(::Float32, ::Float32, ::Float32) Stacktrace: [1] quantile(::Gamma{Float32}, ::Float32)...
If you check, all recent (not so recent anymore) are blocked due to a CI test failing on AppVeyor. I don't have access to the detail (access denied). I used...
Currently rolling functions return an array on the cpu no matter the type of the input array. This means that the result must be moved back to the gpu after...
Hello, I'm pretty sure this is a documentation issue and that the functionality exists internally, but I can't figure out how to evaluate a DecisionRule with a state that's not...
This PR adds a function to compute the retrace bellman operator as described in [this paper](https://paperswithcode.com/paper/safe-and-efficient-off-policy-reinforcement). It is described as an algorithm so I put it in RLZoo but it...
Giving the discussion in #960, I'd make a list of components of RLCore that I feel are missing or incomplete. - [x] Target networks: they are currently named TwinNetwork. This...