imitation
imitation copied to clipboard
Discriminator output for Fu's AIRL paper is wrong
Problem
The discriminator in AIRL here is just a regular one, not corresponding to the one in Fu's paper which can deal with robust dynamics.
Solution
As stated in Fu's paper, the final reward function only depends on state, and the corresponding discriminator should contain a parameter gamma, Define f = r + gamma * V(s') - V(s)

I am close to execute my AIRL code, could you please give me some hints on how you change the discriminator formula? Any comment or insight will be appreciated. Thanks in advance.