Apollo comments

Results 36 comments of


                                            Apollo

Next step

AlphaZero is the great work! Are you implement AlphaZero now?

Next step

What is main different between alphago zero and alphazero? Is same the MCTS architecture?

Policy out softmax with illegal moves

Is not equal softmax((0, -0.5, 0.5))[1:2] and softmax((-0.5,0.5)) when legal_mask=(0,1,1)?

Policy out softmax with illegal moves

right, just it. thanks.

https://github.com/mokemokechicken/reversi-alpha-zero/blob/527ce6ce1b83175c8b2c34c6b51334a67b02c9b1/src/reversi_zero/worker/self_play.py#L63-L64 I see two players use the same model in self play mode.

About MCTS

I see DeepMind backup reward to parent nodes without modify. Why don't use discount-rate γ?

About MCTS

But I think the first step is not related with the final result as final step, when the game length is long.

Performance Reports

@gooooloo what's mean ntest:6 result (6/1/3) for step-418500?

Performance Reports

@gooooloo thanks reply. I see your model is very good. My model can't beat ntest depth 5 now. How about policy and value loss? Mine is (0.15, 0.1) now.

Performance Reports

@gooooloo Um...But you use game history, isn't it?