imitation icon indicating copy to clipboard operation
imitation copied to clipboard

Discriminator output for Fu's AIRL paper is wrong

Open Dormiveglia-elf opened this issue 2 years ago • 1 comments

Problem

The discriminator in AIRL here is just a regular one, not corresponding to the one in Fu's paper which can deal with robust dynamics.

Solution

As stated in Fu's paper, the final reward function only depends on state, and the corresponding discriminator should contain a parameter gamma, Define f = r + gamma * V(s') - V(s) image

Dormiveglia-elf avatar Apr 15 '23 04:04 Dormiveglia-elf

I am close to execute my AIRL code, could you please give me some hints on how you change the discriminator formula? Any comment or insight will be appreciated. Thanks in advance.

roeslib avatar Jul 13 '23 17:07 roeslib