GFNet
GFNet copied to clipboard
model parallel training
why I can only replicate the whole model on different gpu, not the model parallel that distributed part of the model to the gpu?
I follow the instruction on the webpage