Apollo

Results 36 comments of Apollo

AlphaZero is the great work! Are you implement AlphaZero now?

What is main different between alphago zero and alphazero? Is same the MCTS architecture?

Is not equal softmax((0, -0.5, 0.5))[1:2] and softmax((-0.5,0.5)) when legal_mask=(0,1,1)?

https://github.com/mokemokechicken/reversi-alpha-zero/blob/527ce6ce1b83175c8b2c34c6b51334a67b02c9b1/src/reversi_zero/worker/self_play.py#L63-L64 I see two players use the same model in self play mode.

I see DeepMind backup reward to parent nodes without modify. Why don't use discount-rate γ?

But I think the first step is not related with the final result as final step, when the game length is long.

@gooooloo what's mean ntest:6 result (6/1/3) for step-418500?

@gooooloo thanks reply. I see your model is very good. My model can't beat ntest depth 5 now. How about policy and value loss? Mine is (0.15, 0.1) now.

@gooooloo Um...But you use game history, isn't it?