Lijun Wu
Lijun Wu
Thanks for your quick response. I have one other problem, I noticed you set gradient norm clip to be 40, could you tell me how to choose this number? Thanks...
Hi, Miyosuda, I have one question while testing. When you are doing test, choose action is also done by randomly select one action but according by their distribution, but why...
@lforg37 Thanks for your reply. Can I understand like this, whether deterministic is decided by the task, in some task, maybe deterministic policy is better, better while in the game...
Hi, @miyosuda , I got one more problem. In your original code, save model only happens when we use "ctrl+c". I want to save the intermediate model, so I modified...
@miyosuda Thanks a lot. I think it will help.
@miyosuda @lforg37 , currently I am using this framework to do another problem(the optimal policy will be a deterministic policy), but replace the CNN to be FF, and use lstm,...
@sahiliitm, Thanks for your response. I agree with your idea during training, I am just curious about the performance of this two ways. Just like my question on last comment,...
@sahiliitm owo, I think your result is good. Could you explain more about your action chosen during training and testing? That would be very helpful. Thanks a lot. @miyosuda ,...
@miyosuda , I used multiRNNCell and added one common lstm layer after the first lstm layer, currently it look like this: ``` python self.lstm = CustomBasicLSTM(hidden_size) self.stacked_lstm = rnn_cell.MultiRNNCell([self.lstm]*2) ```...