Chess-Zero icon indicating copy to clipboard operation
Chess-Zero copied to clipboard

Multiprocess and Model API Pipe

Open apollo-time opened this issue 8 years ago • 7 comments

https://github.com/Akababa/Chess-Zero/blob/90a5aad05656131506239388557b9f60d16235a3/src/chess_zero/worker/self_play.py#L33-L41

I see you create max_processes pipe list of search_threads pipes and max_processes execute self play processes with the pipe list. A self player use own pipe list by pop from it. https://github.com/Akababa/Chess-Zero/blob/90a5aad05656131506239388557b9f60d16235a3/src/chess_zero/worker/self_play.py#L86-L87

More ever you keep it going continuously the same pipe list. https://github.com/Akababa/Chess-Zero/blob/90a5aad05656131506239388557b9f60d16235a3/src/chess_zero/worker/self_play.py#L56

Is it ok multiple pop from the same list? Why create the shared list by calling Manager.list and send the list to player, instead of sending pipe list itself? like this:

futures.append(executor.submit(self_play_buffer, self.config, cur=self.current_model.get_pipes(self.config.play.search_threads)))
# and
def self_play_buffer(config, cur) -> (ChessEnv, list):
    pipes = cur # without pop

apollo-time avatar Dec 25 '17 12:12 apollo-time

I hear you; I was trying to do that for the longest time before giving up and using shared manager.

The processpool will only have 3 running at a time, so it's safe to pop once for each (I append it back in the end). If I'm not mistaken calling get_pipes every time will leak pipes.

Akababa avatar Dec 25 '17 15:12 Akababa

Oh, I see append pipe back in the end. https://github.com/Akababa/Chess-Zero/blob/90a5aad05656131506239388557b9f60d16235a3/src/chess_zero/worker/self_play.py#L118 And I think python is not enough implement search tree using multi processing. https://github.com/Zeta36/chess-alpha-zero/issues/13 Is the good result train using SL or RL?

apollo-time avatar Dec 26 '17 02:12 apollo-time

Yeah python is slow but since I already maxed out my GPU usage I will just keep using it for now. When we do distributed self play we will most likely need C++ like leela zero though (@benediamond has already started implementing this part) Supervised, although it may be just luck as I'm having trouble reproducing it. I tried a whole bunch of stuff along the way and sometimes the most unintuitive things just work for no reason.

Akababa avatar Dec 26 '17 02:12 Akababa

Thanks reply. I try multi processing search tree select_action_q_and_u part without expanding, is not use GPU and use only CPU. So if has multiple core CPU, this search can be faster in multiprocessing. And are you believe the residual model can be train chess moves is not in the training data with SL? More ever for RL, the residual model can be train chess moves is not passed ever in self playing?

apollo-time avatar Dec 26 '17 02:12 apollo-time

Nice, I look forward to seeing it! I'm not sure what your question is. The SL policy is just 1 for the human move and 0 for everything else. And chess has no "passing" allowed.

Akababa avatar Dec 26 '17 02:12 Akababa

My question is that the residual model can be train chess moves on the status, this board status is not in the training data.

apollo-time avatar Dec 26 '17 02:12 apollo-time

https://github.com/Akababa/Chess-Zero/wiki#model-diagram Oh I think I understand now. The board status (except for 3 times repetition) can be inferred from the moves leading up to it. I left out the history planes for simplicity and to combat overfitting.

Akababa avatar Dec 26 '17 02:12 Akababa