Bo Zheng comments

Results 6 comments of


                                            Bo Zheng

Add Qwen2MoE

@ArthurZucker Hi, I have rebased our branch and solved all the conflicts. I think the codes are ready to be merged now.

About the official evaluation script

> Hi @bozheng-hit, recommend using [eval.ai](https://eval.ai/web/challenges/challenge-page/1697/leaderboard) for official test results. We will be opening submissions to the MMNLU-22 phases soon. Is it possible to return results for all languages separately?

different load_balancing_loss with different pipeline_parallel_size

BTW, is there an example for enabling moe_weight_parallism in megablocks?

different load_balancing_loss with different pipeline_parallel_size

I think it is caused by precision. Is there a plan for supporting MoE weight parallelism in the Megatron-LM fork?

different load_balancing_loss with different pipeline_parallel_size

I'd like to use weight parallelism in dMoE with SwiGLU layers. I noticed that weight parallelism is not supported with GLU yet.

different load_balancing_loss with different pipeline_parallel_size

Yes, the expert model parallelism does not support distributed optimizer, and it uses the data parallel group, which is inconvenient for larger-scale pre-training.