Apollo

Results 7 issues of Apollo

https://github.com/mokemokechicken/reversi-alpha-zero/blob/5ee2f330663b34513f0c894eb658f03a1201f400/src/reversi_zero/agent/player.py#L115-L121 I first think this code is searching in the simulation_num_per_move threads at the same time. But I see async function is not called in the multi thread. How about...

I see my model don't be improved anymore. Moreover I found "It may forget pertinent information about positions that it no longer visits" as [ThomasWAnthony's](https://www.reddit.com/r/MachineLearning/comments/76xjb5/ama_we_are_david_silver_and_julian_schrittwieser/dolnq31/) when opinion select action unusually....

https://github.com/mokemokechicken/reversi-alpha-zero/blob/5ee2f330663b34513f0c894eb658f03a1201f400/src/reversi_zero/agent/model.py#L48 I see calculate policy softmax on the all moves contains illegal. How can calculate softmax on the only legal moves, if set placeholder for legal moves?

https://github.com/mokemokechicken/reversi-alpha-zero/blob/f1cfa6c7177ec5f76a89e20fd97eb4c5d678611d/src/reversi_zero/agent/player.py#L165-L168 I see update N and W with virtual loss when select the node in order to discourages other threads from simultaneously exploring the identical variation (in paper). 1. Why...

Thanks for sharing your code. I'll train the model myself with tensorflow. How can I get the train image dataset?

I can't found single_play.py. How can I start to train it self-mode? And had you trained with alphago zero method and how about result? Thanks.

https://github.com/Akababa/Chess-Zero/blob/90a5aad05656131506239388557b9f60d16235a3/src/chess_zero/worker/self_play.py#L33-L41 I see you create max_processes pipe list of search_threads pipes and max_processes execute self play processes with the pipe list. A self player use own pipe list by pop...