[ENHANCEMENT]How, or rather, is there any support provided for MOE models of Qwen2MoeForCausalLM in the transformers library?
Is your feature request related to a problem? Please describe. I have seen support for training MOE models in Megatron, including scripts for the Mixtral 8x7B model, at: https://docs.nvidia.com/megatron-core/developer-guide/latest/api-guide/moe.html.
However, at Alibaba, we are interested in training MOE models for Qwen2MoeForCausalLM, supported by Megatron, where the model is based on the architecture code from the transformers library: https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_moe/modeling_qwen2_moe.py.
I wonder if such support is available?
Marking as stale. No activity in 60 days.
Do you still have this feature request? maybe we can help.
Marking as stale. No activity in 60 days.
Do you still have this feature request? maybe we can help. Is it still not supported? By May 13, 2025, I hope it will be supported in the future. We are forced to use DeepSpeed for training.
Marking as stale. No activity in 60 days.
@liangshaopeng you can find our latest Qwen3-MoE examples here: https://github.com/yanring/Megatron-MoE-ModelZoo.
Are you interested in starting from the Hugging Face model or just looking for similar Qwen MoE support in Megatron?
This feature has been implemented. Megatron-LM now provides full support for Qwen-MoE models.