vime issues

Could the algorithm be used on reinforcement learning algorithms with experience reply?

Could the algorithm be used on reinforced learning algorithms with experience reply?

reproduce on Halfcheetahx experiment

1

Hi, Can I ask if anybody has reproduced the results of Halcheetahx experiment in the vime paper? Can anybody show me the hyperparameters you chose? Thanks!

xht033

What's the difference between run_trpo.py and run_trpo_expl.py?

If I understand right, run_trpo_expl.py is trpo + vime, so run_trpo is trpo w/o vime?

gd-zhang

Could you add some information to recreate the state space figure?

5

This is the figure I'm referring to:

mrdrozdov

Enhancement: rewriting the mutual information as in BALD [Houlsby et al 2011]

1

[In discussion with Jose Miguel Hernandez-Lobato @jmhernandezlobato and Daniel Hernandez-Lobato @danielhernandezlobato] The current exploration objective used in the paper is a sum of expected reductions in entropy of the parameters...

thangbui

vime
vime copied to clipboard

Metadata

Could the algorithm be used on reinforcement learning algorithms with experience reply?

reproduce on Halfcheetahx experiment

What's the difference between run_trpo.py and run_trpo_expl.py?

Could you add some information to recreate the state space figure?

Enhancement: rewriting the mutual information as in BALD [Houlsby et al 2011]

← Metadata

Owner

Metadata

vime vime copied to clipboard

Metadata

Could the algorithm be used on reinforcement learning algorithms with experience reply?

reproduce on Halfcheetahx experiment

What's the difference between run_trpo.py and run_trpo_expl.py?

Could you add some information to recreate the state space figure?

Enhancement: rewriting the mutual information as in BALD [Houlsby et al 2011]

← Metadata

Owner

Metadata

vime
vime copied to clipboard