Automodel icon indicating copy to clipboard operation
Automodel copied to clipboard

Qwen3 VL MoE support

Open iqiancheng opened this issue 3 months ago • 5 comments

any plans to support Qwen3-VL-30B-A3B or Qwen3-VL-235B-A22B ?

iqiancheng avatar Nov 07 '25 08:11 iqiancheng

Hi @iqiancheng ,

We have initial support for Qwen3-VL-30B-A3B with FSDP2 (please see here for recipes), and we are planning to also support the 235B variant.

For the 235B variant we are planning to support deepep + pipeline parallelism to enable training at that scale.

akoumpa avatar Nov 10 '25 00:11 akoumpa

Hi @iqiancheng ,

We have initial support for Qwen3-VL-30B-A3B with FSDP2 (please see here for recipes), and we are planning to also support the 235B variant.

For the 235B variant we are planning to support deepep + pipeline parallelism to enable training at that scale.

Thanks for your patient reply @akoumpa , I didn't find any recipes about 'Qwen3-VL-30B-A3B' under the examples directory, are you referring to the recipe 'qwen3_omni_moe_30b_te_deepep.yaml'?

iqiancheng avatar Nov 11 '25 06:11 iqiancheng

Hi @iqiancheng ,

you can override the model-id, for example, by passing to CLI --model.pretrained_model_name_or_path Qwen/Qwen3-VL-30B-A3B-Instruct with the qwen3 configs. However, since this is an MoE model, I think it would be better to use deepep.

CC @HuiyingLi

akoumpa avatar Nov 14 '25 18:11 akoumpa

Hi @iqiancheng I will look at recipe for qwen3vl30b next week.

HuiyingLi avatar Nov 14 '25 18:11 HuiyingLi

30B recipes have been merged, I keep this open for the 235B

akoumpa avatar Nov 25 '25 06:11 akoumpa