Henri Dehaybe issues

Results 15 issues of


                                            Henri Dehaybe

tanh normalization destabilizes learning with GaussianNetwork

As I was testing MPO on the cartpole environment, I noticed the algorithm was pretty unstable and has trouble stabilizing at the 200 returns policy. I eventually thought about the...

WIP: Add MPO in zoo

I'm opening this as a draft so discussions are possible early. This implements the MPO algorithm from [this paper](https://arxiv.org/abs/1806.06920) and [its improved version](https://arxiv.org/abs/1812.02256) PR Checklist - [ ] Update NEWS.md?...

enhancement

RLZoo

WIP

Make documentation on trace normalization

To do when the new Trajectories are merge here.

doc

Add Retrace and a QNetwork abstraction

Following our discussion of yesterday at #613, I'm creating this draft PR to show how I went on implementing the Retrace Algorithm. More precisely, I wanted to implement Retrace as...

Quantile function for Gamma distribution not defined for Float32 input

```julia τs = Float32.(collect(1:99)./(100)) xs = Float32.(rand(Uniform(0,100), 1, 64)) dists = Gamma.(xs, 2*xs) quantile.(dists, τs) ``` throws ``` ERROR: MethodError: no method matching gammainvcdf(::Float32, ::Float32, ::Float32) Stacktrace: [1] quantile(::Gamma{Float32}, ::Float32)...

Remove AppVeyor

If you check, all recent (not so recent anymore) are blocked due to a CI test failing on AppVeyor. I don't have access to the detail (access denied). I used...

Compatibility with CUDA ?

Currently rolling functions return an array on the cpu no matter the type of the input array. This means that the result must be moved back to the gpu after...

Evaluating at multidimensional state ?

Hello, I'm pretty sure this is a documentation issue and that the functionality exists internally, but I can't figure out how to evaluate a DecisionRule with a state that's not...

add retrace

This PR adds a function to compute the retrace bellman operator as described in [this paper](https://paperswithcode.com/paper/safe-and-efficient-off-policy-reinforcement). It is described as an algorithm so I put it in RLZoo but it...

Missing features in RLCore

Giving the discussion in #960, I'd make a list of components of RLCore that I feel are missing or incomplete. - [x] Target networks: they are currently named TwinNetwork. This...

v0.12