Automodel Qwen3 VL MoE support

any plans to support Qwen3-VL-30B-A3B or Qwen3-VL-235B-A22B ?

Nov 07 '25 08:11 iqiancheng

Hi @iqiancheng ,

We have initial support for Qwen3-VL-30B-A3B with FSDP2 (please see here for recipes), and we are planning to also support the 235B variant.

For the 235B variant we are planning to support deepep + pipeline parallelism to enable training at that scale.

Nov 10 '25 00:11 akoumpa

Hi @iqiancheng ,

We have initial support for Qwen3-VL-30B-A3B with FSDP2 (please see here for recipes), and we are planning to also support the 235B variant.

For the 235B variant we are planning to support deepep + pipeline parallelism to enable training at that scale.

Thanks for your patient reply @akoumpa , I didn't find any recipes about 'Qwen3-VL-30B-A3B' under the examples directory, are you referring to the recipe 'qwen3_omni_moe_30b_te_deepep.yaml'?

Nov 11 '25 06:11 iqiancheng

Hi @iqiancheng ,

you can override the model-id, for example, by passing to CLI --model.pretrained_model_name_or_path Qwen/Qwen3-VL-30B-A3B-Instruct with the qwen3 configs. However, since this is an MoE model, I think it would be better to use deepep.

CC @HuiyingLi

Nov 14 '25 18:11 akoumpa

Hi @iqiancheng I will look at recipe for qwen3vl30b next week.

Nov 14 '25 18:11 HuiyingLi

30B recipes have been merged, I keep this open for the 235B

Nov 25 '25 06:11 akoumpa