Inverse-Reinforcement-Learning icon indicating copy to clipboard operation
Inverse-Reinforcement-Learning copied to clipboard

Are Ziebart's thesis, equation 9.2 and find_policy() function the same?

Open tessavdheiden opened this issue 3 years ago • 0 comments

Hi Matthew!

This repo is just great: It works, its transparant and modular!

I only found two differences between Ziebart's thesis and your implementation. Can you let me know if you were aware of them?

So here is Eq 9.2: Screenshot 2022-06-07 at 11 12 54

Here is your code: Screenshot 2022-06-07 at 11 10 21

And here is Eq 9.1: Screenshot 2022-06-07 at 11 12 59 Which uses $V^{\text{soft}}$: Screenshot 2022-06-07 at 11 17 22

And here is your code: Screenshot 2022-06-07 at 11 10 30

You include a discount factor in Eq 9.2, and in 9.1 you convert a subtraction ($Q^{\text{soft}}-V^{\text{soft}}$) into a fraction ($\frac{Q^{\text{soft}}}{V^{\text{soft}}}$), correct?

tessavdheiden avatar Jun 07 '22 10:06 tessavdheiden