[Bug] Unable to create Qwen3 MoE model
How do I get ART to instantiate the model with FastModel instead of FastLanguageModel in Unsloth (Unsloth docs says: If you're fine-tuning the MOE models, please use FastModel and not FastLanguageModel)? I seem to be running into a model loading issue as seen from the error.
model = art.TrainableModel(
name="001-script",
project="testing",
base_model="Qwen/Qwen3-30B-A3B-Thinking-2507",
_internal_config=art.dev.InternalModelConfig(
init_args=art.dev.InitArgs(
load_in_4bit=False,
max_seq_length=65536,
),
engine_args=art.dev.EngineArgs(
max_model_len=65536,
tensor_parallel_size=8,
gpu_memory_utilization=0.75,
),
),
)
await model.register(backend)
Error:
Traceback (most recent call last):
File "/root/openpipe/train.py", line 337, in <module>
asyncio.run(train())
File "/root/openpipe/.venv/lib/python3.11/site-packages/nest_asyncio.py", line 30, in run
return loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/openpipe/.venv/lib/python3.11/site-packages/nest_asyncio.py", line 98, in run_until_complete
return f.result()
^^^^^^^^^^
File "/usr/lib/python3.11/asyncio/futures.py", line 203, in result
raise self._exception.with_traceback(self._exception_tb)
File "/usr/lib/python3.11/asyncio/tasks.py", line 277, in __step
result = coro.send(None)
^^^^^^^^^^^^^^^
File "/root/openpipe/train.py", line 296, in train
await model.register(backend)
File "/root/openpipe/.venv/lib/python3.11/site-packages/art/model.py", line 335, in register
base_url, api_key = await backend._prepare_backend_for_training(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/openpipe/.venv/lib/python3.11/site-packages/art/local/backend.py", line 282, in _prepare_backend_for_training
await service.start_openai_server(config=config)
File "/root/openpipe/.venv/lib/python3.11/site-packages/mp_actors/traceback.py", line 26, in async_wrapper
raise e.with_traceback(streamlined_traceback())
File "/root/openpipe/.venv/lib/python3.11/site-packages/art/unsloth/service.py", line 60, in start_openai_server
self.state.trainer.save_model(lora_path)
^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/functools.py", line 1001, in __get__
val = self.func(instance)
^^^^^^^
File "/root/openpipe/.venv/lib/python3.11/site-packages/art/unsloth/service.py", line 45, in state
return ModelState(self.config)
^^^^^^^^^^^^^^^^^
File "/root/openpipe/.venv/lib/python3.11/site-packages/art/unsloth/state.py", line 82, in __init__
unsloth.FastLanguageModel.from_pretrained(**config.get("init_args", {})),
^^^^^^^^^^^^^^^
File "/root/openpipe/.venv/lib/python3.11/site-packages/unsloth/models/loader.py", line 397, in from_pretrained
return FastModel.from_pretrained(
^^^^^^^^^^^^^^^
File "/root/openpipe/.venv/lib/python3.11/site-packages/unsloth/models/loader.py", line 930, in from_pretrained
model, tokenizer = FastBaseModel.from_pretrained(
^^^^^^^^^^^^^^^^^
File "/root/openpipe/.venv/lib/python3.11/site-packages/unsloth/models/vision.py", line 621, in from_pretrained
_, quant_state_dict = get_vllm_state_dict(
^^^^^^^^^^^^^^^
File "/root/openpipe/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^
File "/root/openpipe/.venv/lib/python3.11/site-packages/unsloth_zoo/vllm_utils.py", line 960, in get_vllm_state_dict
proj = layer.mlp.gate_up_proj
^^^^^^^^^^^^^^^^^
File "/root/openpipe/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1940, in __getattr__
raise AttributeError(
^^^^^^^^^^^^^^^^^
AttributeError: 'Qwen3MoeSparseMoeBlock' object has no attribute 'gate_up_proj'
Hey @casper-hansen, We don’t support MoE models yet. However, since vLLM has added MoE LoRA support, it should now be possible to enable this on ART.
@Kovbo while suboptimal, you can train MoE by targeting the attention layers which will work with vLLM. Until the dependency is upgraded to the new version, that could be an alternative. It just needs a higher LoRA rank to match the performance.
@casper-hansen this is super relevant for me. Can you point me to some reference where I can read more about this? I want to post-train a Qwen MOE model as well. @Kovbo I do see a draft PR out for MOE support - https://github.com/OpenPipe/ART/pull/415, do you have any estimate on how long this will take to be ready?
@RitvikKapila I don't have reading material for this. These are empirical results I found in my research which I am repeating here. My experiments are based on Megatron training with LoRA. The loss can go lower while targeting fewer parameters if you choose a high lora rank/alpha of 128 (in my n=1 experiment at least). Conversely, the loss can also be higher if the rank is not high enough since you don't have enough parameters in your adapter to learn the same things - so I settled on 128.
My main argument here is that you could easily provide preliminary support for MoE + LoRA by just targeting the attention layers until you upgrade to vLLM v0.11.1 when the expert LoRA compatibility is released.