Bo Zheng
Bo Zheng
@ArthurZucker Hi, I have rebased our branch and solved all the conflicts. I think the codes are ready to be merged now.
> Hi @bozheng-hit, recommend using [eval.ai](https://eval.ai/web/challenges/challenge-page/1697/leaderboard) for official test results. We will be opening submissions to the MMNLU-22 phases soon. Is it possible to return results for all languages separately?
BTW, is there an example for enabling moe_weight_parallism in megablocks?
I think it is caused by precision. Is there a plan for supporting MoE weight parallelism in the Megatron-LM fork?
I'd like to use weight parallelism in dMoE with SwiGLU layers. I noticed that weight parallelism is not supported with GLU yet.
Yes, the expert model parallelism does not support distributed optimizer, and it uses the data parallel group, which is inconvenient for larger-scale pre-training.