About RL+LSTM

Open 4359hhh opened this issue 3 years ago • 1 comments

Hello, Markov property means that the current action of the agent is only related to the current state s_t, but the input of the policy network in your open-source code rl+lstm is s_ t and a_ t-1, does this type of algorithm converge in the training process? Thank you again for your open-source code.

Jul 20 '22 14:07 4359hhh

I see that this problem has been explained in the references you give.

Jul 20 '22 14:07 4359hhh