Apollo
Apollo
AlphaZero is the great work! Are you implement AlphaZero now?
What is main different between alphago zero and alphazero? Is same the MCTS architecture?
Is not equal softmax((0, -0.5, 0.5))[1:2] and softmax((-0.5,0.5)) when legal_mask=(0,1,1)?
right, just it. thanks.
https://github.com/mokemokechicken/reversi-alpha-zero/blob/527ce6ce1b83175c8b2c34c6b51334a67b02c9b1/src/reversi_zero/worker/self_play.py#L63-L64 I see two players use the same model in self play mode.
I see DeepMind backup reward to parent nodes without modify. Why don't use discount-rate γ?
But I think the first step is not related with the final result as final step, when the game length is long.
@gooooloo what's mean ntest:6 result (6/1/3) for step-418500?
@gooooloo thanks reply. I see your model is very good. My model can't beat ntest depth 5 now. How about policy and value loss? Mine is (0.15, 0.1) now.
@gooooloo Um...But you use game history, isn't it?