Qwen3 VL MoE support
any plans to support Qwen3-VL-30B-A3B or Qwen3-VL-235B-A22B ?
Hi @iqiancheng ,
We have initial support for Qwen3-VL-30B-A3B with FSDP2 (please see here for recipes), and we are planning to also support the 235B variant.
For the 235B variant we are planning to support deepep + pipeline parallelism to enable training at that scale.
Hi @iqiancheng ,
We have initial support for Qwen3-VL-30B-A3B with FSDP2 (please see here for recipes), and we are planning to also support the 235B variant.
For the 235B variant we are planning to support deepep + pipeline parallelism to enable training at that scale.
Thanks for your patient reply @akoumpa , I didn't find any recipes about 'Qwen3-VL-30B-A3B' under the examples directory, are you referring to the recipe 'qwen3_omni_moe_30b_te_deepep.yaml'?
Hi @iqiancheng ,
you can override the model-id, for example, by passing to CLI --model.pretrained_model_name_or_path Qwen/Qwen3-VL-30B-A3B-Instruct with the qwen3 configs. However, since this is an MoE model, I think it would be better to use deepep.
CC @HuiyingLi
Hi @iqiancheng I will look at recipe for qwen3vl30b next week.
30B recipes have been merged, I keep this open for the 235B