lmdeploy icon indicating copy to clipboard operation
lmdeploy copied to clipboard

[Bug] Qwen3-30B moe doesn't work while Qwen3-32B worked.

Open wudajun7509 opened this issue 9 months ago • 2 comments

Checklist

  • [ ] 1. I have searched related issues but cannot get the expected help.
  • [ ] 2. The bug has not been fixed in the latest version.
  • [ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

Add dll path C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin, please note cuda version should >= 11.3 when compiled with cuda 11 The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. You are using a model of type qwen3_moe to instantiate a model of type . This is not supported for all configurations of models and can yield errors. 2025-05-02 08:36:31,341 - lmdeploy - WARNING - supported_models.py:121 - AutoConfig.from_pretrained failed for C:\Qwen\Qwen3-30B-A3B. Exception: The checkpoint you are trying to load has model type qwen3_moe but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

You can update Transformers with the command pip install --upgrade transformers. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command pip install git+https://github.com/huggingface/transformers.git The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. You are using a model of type qwen3_moe to instantiate a model of type . This is not supported for all configurations of models and can yield errors. The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. You are using a model of type qwen3_moe to instantiate a model of type . This is not supported for all configurations of models and can yield errors. The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. You are using a model of type qwen3_moe to instantiate a model of type . This is not supported for all configurations of models and can yield errors. The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. You are using a model of type qwen3_moe to instantiate a model of type . This is not supported for all configurations of models and can yield errors. The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. You are using a model of type qwen3_moe to instantiate a model of type . This is not supported for all configurations of models and can yield errors. The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. You are using a model of type qwen3_moe to instantiate a model of type . This is not supported for all configurations of models and can yield errors. The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. You are using a model of type qwen3_moe to instantiate a model of type . This is not supported for all configurations of models and can yield errors. The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. You are using a model of type qwen3_moe to instantiate a model of type . This is not supported for all configurations of models and can yield errors. Convert to turbomind format: 0%| | 0/48 [00:00<?, ?it/s]Traceback (most recent call last): File "C:\Users\danny\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\danny\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code exec(code, run_globals) File "C:\lmdeploy\Scripts\lmdeploy.exe_main.py", line 7, in sys.exit(run()) File "C:\lmdeploy\lib\site-packages\lmdeploy\cli\entrypoint.py", line 39, in run args.run(args) File "C:\lmdeploy\lib\site-packages\lmdeploy\cli\serve.py", line 322, in api_server run_api_server(args.model_path, File "C:\lmdeploy\lib\site-packages\lmdeploy\serve\openai\api_server.py", line 1115, in serve VariableInterface.async_engine = pipeline_class(model_path=model_path, File "C:\lmdeploy\lib\site-packages\lmdeploy\serve\async_engine.py", line 277, in init self._build_turbomind(model_path=model_path, backend_config=backend_config, **kwargs) File "C:\lmdeploy\lib\site-packages\lmdeploy\serve\async_engine.py", line 328, in _build_turbomind self.engine = tm.TurboMind.from_pretrained(model_path, File "C:\lmdeploy\lib\site-packages\lmdeploy\turbomind\turbomind.py", line 280, in from_pretrained return cls(model_path=pretrained_model_name_or_path, File "C:\lmdeploy\lib\site-packages\lmdeploy\turbomind\turbomind.py", line 108, in init self.model_comm = self._from_hf(model_source=model_source, File "C:\lmdeploy\lib\site-packages\lmdeploy\turbomind\turbomind.py", line 215, in _from_hf tm_model.export() File "C:\lmdeploy\lib\site-packages\lmdeploy\turbomind\deploy\target_model\base.py", line 204, in export if self.model(i, reader): File "C:\lmdeploy\lib\site-packages\lmdeploy\turbomind\deploy\module.py", line 334, in call m(i, r) File "C:\lmdeploy\lib\site-packages\lmdeploy\turbomind\deploy\module.py", line 71, in call return self.apply(*args, **kwargs) File "C:\lmdeploy\lib\site-packages\lmdeploy\turbomind\deploy\module.py", line 151, in apply self.model.save_split(gate, self._moe_ffn_gate.format(i)) File "C:\lmdeploy\lib\site-packages\lmdeploy\turbomind\deploy\target_model\base.py", line 172, in save_split if copy or (tensor.dim() == 1 and split_dim == 0): AttributeError: 'NoneType' object has no attribute 'dim'

Reproduction

lmdeploy serve api_server C:\Qwen\Qwen3-30B-A3B --server-port 1234 --api-key abcd+1234 --session-len 131072 --max-batch-size 128

Environment

Qwen3-32B worked

Error traceback


wudajun7509 avatar May 02 '25 00:05 wudajun7509

Ran into the same problem, additionally ran 235B-22B and found it didn't work properly

warlockedward avatar May 04 '25 02:05 warlockedward

when I upgraded it to 0.8.0, It appears :You are using a model of type qwen3_moe to instantiate a model of type . This is not supported for all configurations of models and can yield errors. and now Qwen3-32B can't work!

wudajun7509 avatar May 06 '25 00:05 wudajun7509