Discriminator output for Fu's AIRL paper is wrong

Open Dormiveglia-elf opened this issue 2 years ago • 1 comments

Problem

The discriminator in AIRL here is just a regular one, not corresponding to the one in Fu's paper which can deal with robust dynamics.

Solution

As stated in Fu's paper, the final reward function only depends on state, and the corresponding discriminator should contain a parameter gamma, Define f = r + gamma * V(s') - V(s)

Apr 15 '23 04:04 Dormiveglia-elf

I am close to execute my AIRL code, could you please give me some hints on how you change the discriminator formula? Any comment or insight will be appreciated. Thanks in advance.

Jul 13 '23 17:07 roeslib