[Bug] Qwen3-235B-A22B-FP8 cannot be loaded using pytorch
Checklist
- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
Describe the bug
I can use Qwen3-235B-A22B normally, but I can't pull up the official FP8 no matter what I do
Reproduction
lmdeploy serve api_server Qwen3-235B-A22B-FP8 --server-port 8811 --cache-max-entry-count 0.9 --tp 8 --log-level INFO --backend pytorch
Environment
sys.platform: linux
Python: 3.10.12 (main, Feb 4 2025, 14:57:36) [GCC 11.4.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1,2,3,4,5,6,7: NVIDIA H20
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.4, V12.4.131
GCC: x86_64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.5.1+cu121
PyTorch compiling details: PyTorch built with:
- GCC 9.3
- C++ Version: 201703
- Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v3.5.3 (Git Hash 66f0cb9eb66affd2da3bf5f8d897376f04aae6af)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX512
- CUDA Runtime 12.1
- NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
- CuDNN 90.1 (built against CUDA 12.4)
- Magma 2.6.1
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=9.1.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.5.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,
TorchVision: 0.20.1+cu121
LMDeploy: 0.8.0+
transformers: 4.51.0
gradio: 5.22.0
fastapi: 0.115.11
pydantic: 2.10.6
triton: 3.1.0
NVIDIA Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 NIC1 NIC2 NIC3 NIC4 NIC5 NIC6 NIC7 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV18 NV18 NV18 NV18 NV18 NV18 NV18 PIX SYS SYS SYS SYS SYS SYS SYS 0-383 0 N/A
GPU1 NV18 X NV18 NV18 NV18 NV18 NV18 NV18 SYS PIX PHB SYS SYS SYS SYS SYS 0-383 0 N/A
GPU2 NV18 NV18 X NV18 NV18 NV18 NV18 NV18 SYS PHB PIX SYS SYS SYS SYS SYS 0-383 0 N/A
GPU3 NV18 NV18 NV18 X NV18 NV18 NV18 NV18 SYS SYS SYS PIX SYS SYS SYS SYS 0-383 0 N/A
GPU4 NV18 NV18 NV18 NV18 X NV18 NV18 NV18 SYS SYS SYS SYS PIX SYS SYS SYS 0-383 0 N/A
GPU5 NV18 NV18 NV18 NV18 NV18 X NV18 NV18 SYS SYS SYS SYS SYS PIX SYS SYS 0-383 0 N/A
GPU6 NV18 NV18 NV18 NV18 NV18 NV18 X NV18 SYS SYS SYS SYS SYS SYS PIX PHB 0-383 0 N/A
GPU7 NV18 NV18 NV18 NV18 NV18 NV18 NV18 X SYS SYS SYS SYS SYS SYS PHB PIX 0-383 0 N/A
NIC0 PIX SYS SYS SYS SYS SYS SYS SYS X SYS SYS SYS SYS SYS SYS SYS
NIC1 SYS PIX PHB SYS SYS SYS SYS SYS SYS X PHB SYS SYS SYS SYS SYS
NIC2 SYS PHB PIX SYS SYS SYS SYS SYS SYS PHB X SYS SYS SYS SYS SYS
NIC3 SYS SYS SYS PIX SYS SYS SYS SYS SYS SYS SYS X SYS SYS SYS SYS
NIC4 SYS SYS SYS SYS PIX SYS SYS SYS SYS SYS SYS SYS X SYS SYS SYS
NIC5 SYS SYS SYS SYS SYS PIX SYS SYS SYS SYS SYS SYS SYS X SYS SYS
NIC6 SYS SYS SYS SYS SYS SYS PIX PHB SYS SYS SYS SYS SYS SYS X PHB
NIC7 SYS SYS SYS SYS SYS SYS PHB PIX SYS SYS SYS SYS SYS SYS PHB X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_bond_0
NIC1: mlx5_bond_1
NIC2: mlx5_bond_2
NIC3: mlx5_bond_3
NIC4: mlx5_bond_4
NIC5: mlx5_bond_5
NIC6: mlx5_bond_6
NIC7: mlx5_bond_7
Error traceback
2025-05-13 11:22:35,063 - lmdeploy - INFO - async_engine.py:259 - input backend=pytorch, backend_config=PytorchEngineConfig(dtype='auto', tp=8, dp=1, dp_rank=0, ep=1, session_len=None, max_batch_size=128, cache_max_entry_count=0.9, prefill_interval=16, block_size=64, num_cpu_blocks=0, num_gpu_blocks=0, adapters=None, max_prefill_token_num=8192, thread_safe=False, enable_prefix_caching=False, device_type='cuda', eager_mode=False, custom_module_map=None, download_dir=None, revision=None, quant_policy=0, distributed_executor_backend=None, enable_microbatch=False)
2025-05-13 11:22:35,063 - lmdeploy - INFO - async_engine.py:260 - input chat_template_config=None
2025-05-13 11:22:35,065 - lmdeploy - INFO - async_engine.py:269 - updated chat_template_onfig=ChatTemplateConfig(model_name='qwen', system=None, meta_instruction=None, eosys=None, user=None, eoh=None, assistant=None, eoa=None, tool=None, eotool=None, separator=None, capability=None, stop_words=None)
2025-05-13 11:22:36,282 - lmdeploy - WARNING - transformers.py:22 - LMDeploy requires transformers version: [4.33.0 ~ 4.49.0], but found version: 4.51.0
2025-05-13 11:22:36,374 - lmdeploy - INFO - __init__.py:81 - Build <ray> executor.
2025-05-13 11:22:37,113 - lmdeploy - INFO - ray_executor.py:247 - Init ray cluster.
2025-05-13 11:22:37,140 INFO worker.py:1660 -- Connecting to existing Ray cluster at address: 10.126.218.17:8192...
2025-05-13 11:22:37,151 INFO worker.py:1852 -- Connected to Ray cluster.
2025-05-13 11:22:37,306 - lmdeploy - INFO - ray_executor.py:275 - Init ray workers.
2025-05-13 11:22:37,382 - lmdeploy - INFO - ray_executor.py:281 - Init distributed environment by device.
2025-05-13 11:22:39,991 - lmdeploy - INFO - ray_executor.py:284 - Init distributed process group.
(RayWorkerWrapper pid=24334) 2025-05-13 11:22:39,993 - lmdeploy - INFO - dist_utils.py:29 - MASTER_ADDR=10.126.218.17, MASTER_PORT=60933
2025-05-13 11:22:42,297 - lmdeploy - INFO - ray_executor.py:294 - Warming up distribute environment, this might take long time, please waiting...
2025-05-13 11:23:02,798 - lmdeploy - INFO - base.py:152 - Building Model.
Loading weights from safetensors: 0%| | 0/48 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/opt/py3/bin/lmdeploy", line 8, in <module>
sys.exit(run())
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/cli/entrypoint.py", line 39, in run
args.run(args)
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/cli/serve.py", line 333, in api_server
run_api_server(args.model_path,
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/serve/openai/api_server.py", line 1121, in serve
VariableInterface.async_engine = pipeline_class(model_path=model_path,
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 279, in __init__
self._build_pytorch(model_path=model_path, backend_config=backend_config, **kwargs)
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 341, in _build_pytorch
self.engine = Engine(model_path=model_path, tokenizer=self.tokenizer, engine_config=backend_config)
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 335, in __init__
self.executor.init()
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/base.py", line 153, in init
self.build_model()
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 311, in build_model
self.collective_rpc('build_model')
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 307, in collective_rpc
return ray.get([getattr(worker, method).remote(*args, **kwargs) for worker in self.workers], timeout=timeout)
File "/opt/py3/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
return fn(*args, **kwargs)
File "/opt/py3/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
File "/opt/py3/lib/python3.10/site-packages/ray/_private/worker.py", line 2782, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
File "/opt/py3/lib/python3.10/site-packages/ray/_private/worker.py", line 929, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): ray::RayWorkerWrapper.build_model() (pid=26688, ip=10.126.218.17, actor_id=867d5462c52f82ceac0930ba06000000, repr=<lmdeploy.pytorch.engine.executor.ray_executor.RayWorkerWrapper object at 0x7fb43870a410>)
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/base_worker.py", line 98, in build_model
self.model_agent.build_model()
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 621, in build_model
self._build_model()
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 612, in _build_model
load_model_weights(patched_model, model_path, device=device)
File "/opt/py3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/weight_loader/model_weight_loader.py", line 166, in load_model_weights
loader.load_model_weights(model, device=device)
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/weight_loader/model_weight_loader.py", line 157, in load_model_weights
model.load_weights(weights_iterator)
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/models/qwen3_moe.py", line 511, in load_weights
self._load_weight_experts(name, loaded_weight, params_dict, expert_params_mapping=expert_params_mapping)
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/models/qwen3_moe.py", line 471, in _load_weight_experts
load_weight(param, loaded_weight, expert_id=expert_id, shard_id=shard_id)
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/weight_loader/model_weight_loader.py", line 20, in load_weight
param.weight_loader(param, loaded_weight, **kwargs)
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/nn/moe.py", line 470, in weight_loader_scale_tp
param_data.copy_(weight)
RuntimeError: output with shape [1, 32] doesn't match the broadcast shape [2, 32]
Loading weights from safetensors: 0%| | 0/48 [00:01<?, ?it/s]
(RayWorkerWrapper pid=27612) 2025-05-13 11:22:39,993 - lmdeploy - INFO - dist_utils.py:29 - MASTER_ADDR=10.126.218.17, MASTER_PORT=60933 [repeated 7x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
Can you try to use --tp 4?
Amazing! tp=4 can load the model, but it looks like OOM. Anyway, I only sent one request.
(RayWorkerWrapper pid=31401) *** SIGFPE received at time=1747278715 on cpu 140 ***
(RayWorkerWrapper pid=31401) PC: @ 0x7f08c7cff921 (unknown) (unknown)
(RayWorkerWrapper pid=31401) @ 0x7f3a90cc1520 (unknown) (unknown)
(RayWorkerWrapper pid=31401) [2025-05-15 11:11:55,573 E 31401 20200] logging.cc:497: *** SIGFPE received at time=1747278715 on cpu 140 ***
(RayWorkerWrapper pid=31401) [2025-05-15 11:11:55,573 E 31401 20200] logging.cc:497: PC: @ 0x7f08c7cff921 (unknown) (unknown)
(RayWorkerWrapper pid=31401) [2025-05-15 11:11:55,573 E 31401 20200] logging.cc:497: @ 0x7f3a90cc1520 (unknown) (unknown)
(RayWorkerWrapper pid=31401) Fatal Python error: Floating point exception
(RayWorkerWrapper pid=31401)
(RayWorkerWrapper pid=31401) Stack (most recent call first):
(RayWorkerWrapper pid=31401) File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/backends/default/linear.py", line 36 in forward
(RayWorkerWrapper pid=31401) File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/nn/linear.py", line 1289 in forward
(RayWorkerWrapper pid=31401) File "/opt/py3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747 in _call_impl
(RayWorkerWrapper pid=31401) File "/opt/py3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736 in _wrapped_call_impl
(RayWorkerWrapper pid=31401) File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/models/qwen3_moe.py", line 428 in get_logits
(RayWorkerWrapper pid=31401) File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/backends/graph_runner.py", line 41 in get_logits
(RayWorkerWrapper pid=31401) File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 666 in get_logits
(RayWorkerWrapper pid=31401) File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 308 in _async_model_forward
(RayWorkerWrapper pid=31401) File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 421 in _async_step_background
(RayWorkerWrapper pid=31401) File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 484 in _async_loop_background
(RayWorkerWrapper pid=31401) File "/usr/lib/python3.10/asyncio/events.py", line 80 in _run
(RayWorkerWrapper pid=31401) File "/usr/lib/python3.10/asyncio/base_events.py", line 1909 in _run_once
(RayWorkerWrapper pid=31401) File "/usr/lib/python3.10/asyncio/base_events.py", line 603 in run_forever
(RayWorkerWrapper pid=31401) File "/usr/lib/python3.10/threading.py", line 953 in run
(RayWorkerWrapper pid=31401) File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
(RayWorkerWrapper pid=31401) File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap
(RayWorkerWrapper pid=31401)
(RayWorkerWrapper pid=31401) Extension modules: msgpack._cmsgpack, google._upb._message, psutil._psutil_linux, psutil._psutil_posix, setproctitle, yaml._yaml, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, ray._raylet, markupsafe._speedups, PIL._imaging, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, PIL._imagingft, av._core, av.logging, av.bytesource, av.buffer, av.audio.format, av.error, av.dictionary, av.container.pyio, av.utils, av.option, av.descriptor, av.format, av.stream, av.container.streams, av.sidedata.motionvectors, av.sidedata.sidedata, av.opaque, av.packet, av.container.input, av.container.output, av.container.core, av.codec.context, av.video.format, av.video.reformatter, av.plane, av.video.plane, av.video.frame, av.video.stream, av.codec.hwaccel, av.codec.codec, av.frame, av.audio.layout, av.audio.plane, av.audio.frame, av.audio.stream, av.filter.pad, av.filter.link, av.filter.context, av.filter.graph, av.filter.filter, av.filter.loudnorm, av.audio.resampler, av.audio.codeccontext, av.audio.fifo, av.bitstream, av.video.codeccontext, pyarrow.lib, pyarrow._json, regex._regex, cuda_utils, __triton_launcher (total: 88)
(RayWorkerWrapper pid=31430)
(RayWorkerWrapper pid=31430)
(RayWorkerWrapper pid=31748)
(RayWorkerWrapper pid=31748)
(RayWorkerWrapper pid=32064)
(RayWorkerWrapper pid=32064)
(raylet) A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffafc57591e5e1d31b8ab199850b000000 Worker ID: 3d984aae224b11df6d4da0fefb10b1a54342e21e643a1dc2e0acdec1 Node ID: b3a87b9949476ecbdb3ff5931b6722e8fb66663936403786b097ad97 Worker IP address: 10.126.218.17 Worker port: 10326 Worker PID: 32064 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
2025-05-15 11:11:56,318 - lmdeploy - ERROR - ray_executor.py:360 - Task-7 task failed.
Traceback (most recent call last):
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 354, in _prefetch_task_callback
task.result()
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 343, in _prefetch_outputs
outs = await self.workers[0].get_outputs.remote()
ray.exceptions.ActorDiedError: The actor died unexpectedly before finishing this task.
class_name: RayWorkerWrapper
actor_id: 8237db0f5dc4ecdddc0598910b000000
pid: 31748
namespace: 1036a397-f807-4346-8d47-1949e44824eb
ip: 10.126.218.17
The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
--cache-max-entry-count 0.6
Same error
@Juniper1021 The error trace information is similar to https://github.com/InternLM/lmdeploy/issues/3343, could you try out
pip install nvidia-cublas-cu12==12.4.5.8
tp=4 is normal, but tp=8 still has an error
Traceback (most recent call last):
File "/opt/py3/bin/lmdeploy", line 8, in <module>
sys.exit(run())
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/cli/entrypoint.py", line 39, in run
args.run(args)
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/cli/serve.py", line 333, in api_server
run_api_server(args.model_path,
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/serve/openai/api_server.py", line 1121, in serve
VariableInterface.async_engine = pipeline_class(model_path=model_path,
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 279, in __init__
self._build_pytorch(model_path=model_path, backend_config=backend_config, **kwargs)
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 341, in _build_pytorch
self.engine = Engine(model_path=model_path, tokenizer=self.tokenizer, engine_config=backend_config)
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 335, in __init__
self.executor.init()
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/base.py", line 153, in init
self.build_model()
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 311, in build_model
self.collective_rpc('build_model')
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 307, in collective_rpc
return ray.get([getattr(worker, method).remote(*args, **kwargs) for worker in self.workers], timeout=timeout)
File "/opt/py3/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
return fn(*args, **kwargs)
File "/opt/py3/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
File "/opt/py3/lib/python3.10/site-packages/ray/_private/worker.py", line 2782, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
File "/opt/py3/lib/python3.10/site-packages/ray/_private/worker.py", line 929, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(IndexError): ray::RayWorkerWrapper.build_model() (pid=33018, ip=10.126.218.17, actor_id=6312aed4056f927b158762bc0e000000, repr=<lmdeploy.pytorch.engine.executor.ray_executor.RayWorkerWrapper object at 0x7f0cc17d48e0>)
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/base_worker.py", line 98, in build_model
self.model_agent.build_model()
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 621, in build_model
self._build_model()
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 612, in _build_model
load_model_weights(patched_model, model_path, device=device)
File "/opt/py3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/weight_loader/model_weight_loader.py", line 166, in load_model_weights
loader.load_model_weights(model, device=device)
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/weight_loader/model_weight_loader.py", line 157, in load_model_weights
model.load_weights(weights_iterator)
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/models/qwen3_moe.py", line 511, in load_weights
self._load_weight_experts(name, loaded_weight, params_dict, expert_params_mapping=expert_params_mapping)
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/models/qwen3_moe.py", line 471, in _load_weight_experts
load_weight(param, loaded_weight, expert_id=expert_id, shard_id=shard_id)
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/weight_loader/model_weight_loader.py", line 20, in load_weight
param.weight_loader(param, loaded_weight, **kwargs)
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/nn/moe.py", line 467, in weight_loader_scale_tp
weight = loaded_weight.chunk(world_size, dim=1)[rank]
IndexError: tuple index out of range
@CUHKSZzxy Can Qwen3-235B-A22B-FP8 be used with DP and EP? If so, can you provide an example?
@CUHKSZzxy Can Qwen3-235B-A22B-FP8 be used with DP and EP? If so, can you provide an example?
- You should use TP4 rather than TP8 for Qwen3-235B-A22B-FP8, the weights are not divisible by too much TP.
- No, haven't tested DP + EP for Qwen3-235B-A22B-FP8.