Multiprocess and Model API Pipe
https://github.com/Akababa/Chess-Zero/blob/90a5aad05656131506239388557b9f60d16235a3/src/chess_zero/worker/self_play.py#L33-L41
I see you create max_processes pipe list of search_threads pipes and max_processes execute self play processes with the pipe list. A self player use own pipe list by pop from it. https://github.com/Akababa/Chess-Zero/blob/90a5aad05656131506239388557b9f60d16235a3/src/chess_zero/worker/self_play.py#L86-L87
More ever you keep it going continuously the same pipe list. https://github.com/Akababa/Chess-Zero/blob/90a5aad05656131506239388557b9f60d16235a3/src/chess_zero/worker/self_play.py#L56
Is it ok multiple pop from the same list? Why create the shared list by calling Manager.list and send the list to player, instead of sending pipe list itself? like this:
futures.append(executor.submit(self_play_buffer, self.config, cur=self.current_model.get_pipes(self.config.play.search_threads)))
# and
def self_play_buffer(config, cur) -> (ChessEnv, list):
pipes = cur # without pop
I hear you; I was trying to do that for the longest time before giving up and using shared manager.
The processpool will only have 3 running at a time, so it's safe to pop once for each (I append it back in the end). If I'm not mistaken calling get_pipes every time will leak pipes.
Oh, I see append pipe back in the end. https://github.com/Akababa/Chess-Zero/blob/90a5aad05656131506239388557b9f60d16235a3/src/chess_zero/worker/self_play.py#L118 And I think python is not enough implement search tree using multi processing. https://github.com/Zeta36/chess-alpha-zero/issues/13 Is the good result train using SL or RL?
Yeah python is slow but since I already maxed out my GPU usage I will just keep using it for now. When we do distributed self play we will most likely need C++ like leela zero though (@benediamond has already started implementing this part) Supervised, although it may be just luck as I'm having trouble reproducing it. I tried a whole bunch of stuff along the way and sometimes the most unintuitive things just work for no reason.
Thanks reply. I try multi processing search tree select_action_q_and_u part without expanding, is not use GPU and use only CPU. So if has multiple core CPU, this search can be faster in multiprocessing. And are you believe the residual model can be train chess moves is not in the training data with SL? More ever for RL, the residual model can be train chess moves is not passed ever in self playing?
Nice, I look forward to seeing it! I'm not sure what your question is. The SL policy is just 1 for the human move and 0 for everything else. And chess has no "passing" allowed.
My question is that the residual model can be train chess moves on the status, this board status is not in the training data.
https://github.com/Akababa/Chess-Zero/wiki#model-diagram Oh I think I understand now. The board status (except for 3 times repetition) can be inferred from the moves leading up to it. I left out the history planes for simplicity and to combat overfitting.