Popular-RL-Algorithms icon indicating copy to clipboard operation
Popular-RL-Algorithms copied to clipboard

About RL+LSTM

Open 4359hhh opened this issue 3 years ago • 1 comments

Hello, Markov property means that the current action of the agent is only related to the current state s_t, but the input of the policy network in your open-source code rl+lstm is s_ t and a_ t-1, does this type of algorithm converge in the training process? Thank you again for your open-source code.

4359hhh avatar Jul 20 '22 14:07 4359hhh

I see that this problem has been explained in the references you give.

4359hhh avatar Jul 20 '22 14:07 4359hhh