ART icon indicating copy to clipboard operation
ART copied to clipboard

[Bug] Unable to create Qwen3 MoE model

Open casper-hansen opened this issue 4 months ago • 4 comments

How do I get ART to instantiate the model with FastModel instead of FastLanguageModel in Unsloth (Unsloth docs says: If you're fine-tuning the MOE models, please use FastModel and not FastLanguageModel)? I seem to be running into a model loading issue as seen from the error.

model = art.TrainableModel(
    name="001-script",
    project="testing",
    base_model="Qwen/Qwen3-30B-A3B-Thinking-2507",
    _internal_config=art.dev.InternalModelConfig(  
        init_args=art.dev.InitArgs(
            load_in_4bit=False,
            max_seq_length=65536,
        ),  
        engine_args=art.dev.EngineArgs(  
            max_model_len=65536,
            tensor_parallel_size=8,
            gpu_memory_utilization=0.75,
        ),  
    ),  
)
await model.register(backend)

Error:

Traceback (most recent call last):
  File "/root/openpipe/train.py", line 337, in <module>
    asyncio.run(train())
  File "/root/openpipe/.venv/lib/python3.11/site-packages/nest_asyncio.py", line 30, in run
    return loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/openpipe/.venv/lib/python3.11/site-packages/nest_asyncio.py", line 98, in run_until_complete
    return f.result()
           ^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/futures.py", line 203, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/usr/lib/python3.11/asyncio/tasks.py", line 277, in __step
    result = coro.send(None)
             ^^^^^^^^^^^^^^^
  File "/root/openpipe/train.py", line 296, in train
    await model.register(backend)
  File "/root/openpipe/.venv/lib/python3.11/site-packages/art/model.py", line 335, in register
    base_url, api_key = await backend._prepare_backend_for_training(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/openpipe/.venv/lib/python3.11/site-packages/art/local/backend.py", line 282, in _prepare_backend_for_training
    await service.start_openai_server(config=config)
  File "/root/openpipe/.venv/lib/python3.11/site-packages/mp_actors/traceback.py", line 26, in async_wrapper
    raise e.with_traceback(streamlined_traceback())
  File "/root/openpipe/.venv/lib/python3.11/site-packages/art/unsloth/service.py", line 60, in start_openai_server
    self.state.trainer.save_model(lora_path)
^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/functools.py", line 1001, in __get__
    val = self.func(instance)
^^^^^^^
  File "/root/openpipe/.venv/lib/python3.11/site-packages/art/unsloth/service.py", line 45, in state
    return ModelState(self.config)
  ^^^^^^^^^^^^^^^^^
  File "/root/openpipe/.venv/lib/python3.11/site-packages/art/unsloth/state.py", line 82, in __init__
    unsloth.FastLanguageModel.from_pretrained(**config.get("init_args", {})),
^^^^^^^^^^^^^^^
  File "/root/openpipe/.venv/lib/python3.11/site-packages/unsloth/models/loader.py", line 397, in from_pretrained
    return FastModel.from_pretrained(
^^^^^^^^^^^^^^^
  File "/root/openpipe/.venv/lib/python3.11/site-packages/unsloth/models/loader.py", line 930, in from_pretrained
    model, tokenizer = FastBaseModel.from_pretrained(
  ^^^^^^^^^^^^^^^^^
  File "/root/openpipe/.venv/lib/python3.11/site-packages/unsloth/models/vision.py", line 621, in from_pretrained
    _, quant_state_dict = get_vllm_state_dict(
^^^^^^^^^^^^^^^
  File "/root/openpipe/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
^^^^^^^^^^^^^^^
  File "/root/openpipe/.venv/lib/python3.11/site-packages/unsloth_zoo/vllm_utils.py", line 960, in get_vllm_state_dict
    proj = layer.mlp.gate_up_proj
  ^^^^^^^^^^^^^^^^^
  File "/root/openpipe/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1940, in __getattr__
    raise AttributeError(
  ^^^^^^^^^^^^^^^^^
AttributeError: 'Qwen3MoeSparseMoeBlock' object has no attribute 'gate_up_proj'

casper-hansen avatar Oct 23 '25 10:10 casper-hansen

Hey @casper-hansen, We don’t support MoE models yet. However, since vLLM has added MoE LoRA support, it should now be possible to enable this on ART.

Kovbo avatar Nov 04 '25 02:11 Kovbo

@Kovbo while suboptimal, you can train MoE by targeting the attention layers which will work with vLLM. Until the dependency is upgraded to the new version, that could be an alternative. It just needs a higher LoRA rank to match the performance.

casper-hansen avatar Nov 04 '25 09:11 casper-hansen

@casper-hansen this is super relevant for me. Can you point me to some reference where I can read more about this? I want to post-train a Qwen MOE model as well. @Kovbo I do see a draft PR out for MOE support - https://github.com/OpenPipe/ART/pull/415, do you have any estimate on how long this will take to be ready?

RitvikKapila avatar Nov 05 '25 00:11 RitvikKapila

@RitvikKapila I don't have reading material for this. These are empirical results I found in my research which I am repeating here. My experiments are based on Megatron training with LoRA. The loss can go lower while targeting fewer parameters if you choose a high lora rank/alpha of 128 (in my n=1 experiment at least). Conversely, the loss can also be higher if the rank is not high enough since you don't have enough parameters in your adapter to learn the same things - so I settled on 128.

My main argument here is that you could easily provide preliminary support for MoE + LoRA by just targeting the attention layers until you upgrade to vLLM v0.11.1 when the expert LoRA compatibility is released.

Image

casper-hansen avatar Nov 05 '25 08:11 casper-hansen