[Bug] qwen3-vl 在线服务启动报错 Error code: 400 - {'message': 'Logprobs or top_logprobs requested but not enabled logprobs_mode in engine configuration.', 'type': 'invalid_request_error', 'code': 400, 'param': None, 'object': 'error'}
Checklist
- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
Describe the bug
环境ubuntu22.04,A800 2025.11.13 使用git clone 最新代码 cd lmdeploy pip install -r requirements/build.txt pip install -e . -v
安装成功后执行
lmdeploy serve api_server /model/Qwen3-VL-8B-sft
--dtype auto
--server-port 23333
--tp 1
--model-name qwen3-vl-sft
--max-batch-size 32
--cache-max-entry-count 0.9
请求服务推理时报错“Error code: 400 - {'message': 'Logprobs or top_logprobs requested but not enabled logprobs_mode in engine configuration.', 'type': 'invalid_request_error', 'code': 400, 'param': None, 'object': 'error'}”,请问这是什么原因
Reproduction
lmdeploy serve api_server /model/Qwen3-VL-8B-sft
--dtype auto
--server-port 23333
--tp 1
--model-name qwen3-vl-sft
--max-batch-size 32
--cache-max-entry-count 0.9 \
Environment
环境版本详细信息如下:
Package Version Editable project location
------------------------- ------------- -------------------------------------
accelerate 1.11.0
addict 2.4.0
aiohappyeyeballs 2.6.1
aiohttp 3.13.2
aiosignal 1.4.0
annotated-doc 0.0.4
annotated-types 0.7.0
anyio 4.11.0
attrs 25.4.0
certifi 2025.11.12
charset-normalizer 3.4.4
click 8.2.1
cmake 4.1.2
cmake-build-extension 0.6.1
distro 1.9.0
einops 0.8.1
fastapi 0.121.1
filelock 3.20.0
fire 0.7.1
frozenlist 1.8.0
fsspec 2025.10.0
gitdb 4.0.12
GitPython 3.1.45
h11 0.16.0
hf-xet 1.2.0
httpcore 1.0.9
httpx 0.28.1
huggingface-hub 0.36.0
idna 3.11
Jinja2 3.1.6
jiter 0.12.0
jsonschema 4.25.1
jsonschema-specifications 2025.9.1
lmdeploy 0.10.2 /code/SFT/lmdeploy
markdown-it-py 4.0.0
MarkupSafe 3.0.3
mdurl 0.1.2
mmengine-lite 0.10.7
mpmath 1.3.0
msgpack 1.1.2
multidict 6.7.0
networkx 3.5
ninja 1.13.0
numpy 2.3.4
nvidia-cublas-cu12 12.8.4.1
nvidia-cuda-cupti-cu12 12.8.90
nvidia-cuda-nvrtc-cu12 12.8.93
nvidia-cuda-runtime-cu12 12.8.90
nvidia-cudnn-cu12 9.10.2.21
nvidia-cufft-cu12 11.3.3.83
nvidia-cufile-cu12 1.13.1.3
nvidia-curand-cu12 10.3.9.90
nvidia-cusolver-cu12 11.7.3.90
nvidia-cusparse-cu12 12.5.8.93
nvidia-cusparselt-cu12 0.7.1
nvidia-nccl-cu12 2.27.3
nvidia-nvjitlink-cu12 12.8.93
nvidia-nvtx-cu12 12.8.90
openai 2.7.2
openai-harmony 0.0.8
packaging 25.0
partial-json-parser 0.2.1.1.post6
peft 0.14.0
pillow 12.0.0
pip 25.2
platformdirs 4.5.0
prometheus_client 0.23.1
propcache 0.4.1
protobuf 6.33.0
psutil 7.1.3
pybind11 2.13.1
pydantic 2.12.4
pydantic_core 2.41.5
Pygments 2.19.2
PyYAML 6.0.3
pyzmq 27.1.0
ray 2.51.1
referencing 0.37.0
regex 2025.11.3
requests 2.32.5
rich 14.2.0
rpds-py 0.28.0
safetensors 0.6.2
sentencepiece 0.2.1
setuptools 80.9.0
setuptools-scm 9.2.2
shortuuid 1.0.13
smmap 5.0.2
sniffio 1.3.1
starlette 0.49.3
sympy 1.14.0
termcolor 3.2.0
tiktoken 0.12.0
timm 1.0.22
tokenizers 0.22.1
torch 2.8.0
torchvision 0.23.0
tqdm 4.67.1
transformers 4.57.1
triton 3.4.0
typing_extensions 4.15.0
typing-inspection 0.4.2
urllib3 2.5.0
uvicorn 0.38.0
wheel 0.45.1
xgrammar 0.1.27
yapf 0.43.0
Error traceback
My bad. #4121 is fixing it
2025-11-15 11:59:12,743 - lmdeploy - WARNING - archs.py:45 - Fallback to pytorch engine because /model/darnellzhu/Qwen-25-VL/Qwen3-VL-8B-sft not supported by turbomind engine.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
2025-11-15 11:59:24,442 - lmdeploy - WARNING - transformers.py:22 - LMDeploy requires transformers version: [4.33.0 ~ 4.56.1], but found version: 4.57.1
Process mp_engine_proc:
Traceback (most recent call last):
File "/root/miniconda3/envs/lmdeploy_qwen/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/root/miniconda3/envs/lmdeploy_qwen/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/code/SFT/internvl3-lmdeploy/lmdeploy/lmdeploy/pytorch/engine/mp_engine/zmq_engine.py", line 92, in _mp_proc
engine = Engine.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^
File "/code/SFT/internvl3-lmdeploy/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 460, in from_pretrained
return cls(model_path=pretrained_model_name_or_path,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/code/SFT/internvl3-lmdeploy/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 387, in init
self.executor.init()
File "/code/SFT/internvl3-lmdeploy/lmdeploy/lmdeploy/pytorch/engine/executor/base.py", line 224, in init
self.build_model()
File "/code/SFT/internvl3-lmdeploy/lmdeploy/lmdeploy/pytorch/engine/executor/uni_executor.py", line 53, in build_model
self.model_agent.build_model()
File "/code/SFT/internvl3-lmdeploy/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 987, in build_model
self._build_model()
File "/code/SFT/internvl3-lmdeploy/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 962, in _build_model
need_output = self.dist_ctx.dp > 1 or self.dist_ctx.rank % self.dist_ctx.tp == 0
^^^^^^^^^^^^^^^^
AttributeError: 'DistContext' object has no attribute 'dp'
重新拉去最新的,又出现这样的报错
Please use this commit: 02cd79b
请使用此提交:02cd79b 采用llama_factory微调qwen3-vl,llama_factory推理结果没问题,lmdeploy推理结果完全不正确,输出格式正确