Megatron-LM [ENHANCEMENT]How, or rather, is there any support provided for MOE models of Qwen2MoeForCausalLM in the transformers library?

Is your feature request related to a problem? Please describe. I have seen support for training MOE models in Megatron, including scripts for the Mixtral 8x7B model, at: https://docs.nvidia.com/megatron-core/developer-guide/latest/api-guide/moe.html.

However, at Alibaba, we are interested in training MOE models for Qwen2MoeForCausalLM, supported by Megatron, where the model is based on the architecture code from the transformers library: https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_moe/modeling_qwen2_moe.py.

I wonder if such support is available?

Jun 05 '24 02:06 liangshaopeng

Marking as stale. No activity in 60 days.

Aug 04 '24 18:08 github-actions[bot]

Do you still have this feature request? maybe we can help.

Jan 17 '25 06:01 qijiaxing

Marking as stale. No activity in 60 days.

Mar 18 '25 18:03 github-actions[bot]

Do you still have this feature request? maybe we can help. Is it still not supported? By May 13, 2025, I hope it will be supported in the future. We are forced to use DeepSpeed for training.

May 13 '25 12:05 liangshaopeng

Marking as stale. No activity in 60 days.

Jul 13 '25 18:07 github-actions[bot]

@liangshaopeng you can find our latest Qwen3-MoE examples here: https://github.com/yanring/Megatron-MoE-ModelZoo.

Are you interested in starting from the Hugging Face model or just looking for similar Qwen MoE support in Megatron?

Jul 15 '25 01:07 sbhavani

This feature has been implemented. Megatron-LM now provides full support for Qwen-MoE models.

Oct 12 '25 19:10 sbhavani