lmdeploy [Bug] qwen3-vl 在线服务启动报错 Error code: 400 - {'message': 'Logprobs or top_logprobs requested but not enabled logprobs_mode in engine configuration.', 'type': 'invalid_request_error', 'code': 400, 'param': None, 'object': 'error'}

Checklist

[x] 1. I have searched related issues but cannot get the expected help.
[x] 2. The bug has not been fixed in the latest version.
[x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

环境ubuntu22.04，A800 2025.11.13 使用git clone 最新代码 cd lmdeploy pip install -r requirements/build.txt pip install -e . -v

安装成功后执行 lmdeploy serve api_server /model/Qwen3-VL-8B-sft
--dtype auto
--server-port 23333
--tp 1
--model-name qwen3-vl-sft
--max-batch-size 32
--cache-max-entry-count 0.9
请求服务推理时报错“Error code: 400 - {'message': 'Logprobs or top_logprobs requested but not enabled logprobs_mode in engine configuration.', 'type': 'invalid_request_error', 'code': 400, 'param': None, 'object': 'error'}”，请问这是什么原因

Reproduction

lmdeploy serve api_server /model/Qwen3-VL-8B-sft
--dtype auto
--server-port 23333
--tp 1
--model-name qwen3-vl-sft
--max-batch-size 32
--cache-max-entry-count 0.9 \

Environment

环境版本详细信息如下：
Package                   Version       Editable project location
------------------------- ------------- -------------------------------------
accelerate                1.11.0
addict                    2.4.0
aiohappyeyeballs          2.6.1
aiohttp                   3.13.2
aiosignal                 1.4.0
annotated-doc             0.0.4
annotated-types           0.7.0
anyio                     4.11.0
attrs                     25.4.0
certifi                   2025.11.12
charset-normalizer        3.4.4
click                     8.2.1
cmake                     4.1.2
cmake-build-extension     0.6.1
distro                    1.9.0
einops                    0.8.1
fastapi                   0.121.1
filelock                  3.20.0
fire                      0.7.1
frozenlist                1.8.0
fsspec                    2025.10.0
gitdb                     4.0.12
GitPython                 3.1.45
h11                       0.16.0
hf-xet                    1.2.0
httpcore                  1.0.9
httpx                     0.28.1
huggingface-hub           0.36.0
idna                      3.11
Jinja2                    3.1.6
jiter                     0.12.0
jsonschema                4.25.1
jsonschema-specifications 2025.9.1
lmdeploy                  0.10.2        /code/SFT/lmdeploy
markdown-it-py            4.0.0
MarkupSafe                3.0.3
mdurl                     0.1.2
mmengine-lite             0.10.7
mpmath                    1.3.0
msgpack                   1.1.2
multidict                 6.7.0
networkx                  3.5
ninja                     1.13.0
numpy                     2.3.4
nvidia-cublas-cu12        12.8.4.1
nvidia-cuda-cupti-cu12    12.8.90
nvidia-cuda-nvrtc-cu12    12.8.93
nvidia-cuda-runtime-cu12  12.8.90
nvidia-cudnn-cu12         9.10.2.21
nvidia-cufft-cu12         11.3.3.83
nvidia-cufile-cu12        1.13.1.3
nvidia-curand-cu12        10.3.9.90
nvidia-cusolver-cu12      11.7.3.90
nvidia-cusparse-cu12      12.5.8.93
nvidia-cusparselt-cu12    0.7.1
nvidia-nccl-cu12          2.27.3
nvidia-nvjitlink-cu12     12.8.93
nvidia-nvtx-cu12          12.8.90
openai                    2.7.2
openai-harmony            0.0.8
packaging                 25.0
partial-json-parser       0.2.1.1.post6
peft                      0.14.0
pillow                    12.0.0
pip                       25.2
platformdirs              4.5.0
prometheus_client         0.23.1
propcache                 0.4.1
protobuf                  6.33.0
psutil                    7.1.3
pybind11                  2.13.1
pydantic                  2.12.4
pydantic_core             2.41.5
Pygments                  2.19.2
PyYAML                    6.0.3
pyzmq                     27.1.0
ray                       2.51.1
referencing               0.37.0
regex                     2025.11.3
requests                  2.32.5
rich                      14.2.0
rpds-py                   0.28.0
safetensors               0.6.2
sentencepiece             0.2.1
setuptools                80.9.0
setuptools-scm            9.2.2
shortuuid                 1.0.13
smmap                     5.0.2
sniffio                   1.3.1
starlette                 0.49.3
sympy                     1.14.0
termcolor                 3.2.0
tiktoken                  0.12.0
timm                      1.0.22
tokenizers                0.22.1
torch                     2.8.0
torchvision               0.23.0
tqdm                      4.67.1
transformers              4.57.1
triton                    3.4.0
typing_extensions         4.15.0
typing-inspection         0.4.2
urllib3                   2.5.0
uvicorn                   0.38.0
wheel                     0.45.1
xgrammar                  0.1.27
yapf                      0.43.0

Error traceback

Nov 13 '25 10:11 zzb213213

My bad. #4121 is fixing it

Nov 13 '25 10:11 lvhan028

2025-11-15 11:59:12,743 - lmdeploy - WARNING - archs.py:45 - Fallback to pytorch engine because /model/darnellzhu/Qwen-25-VL/Qwen3-VL-8B-sft not supported by turbomind engine. huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) 2025-11-15 11:59:24,442 - lmdeploy - WARNING - transformers.py:22 - LMDeploy requires transformers version: [4.33.0 ~ 4.56.1], but found version: 4.57.1 Process mp_engine_proc: Traceback (most recent call last): File "/root/miniconda3/envs/lmdeploy_qwen/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/root/miniconda3/envs/lmdeploy_qwen/lib/python3.12/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/code/SFT/internvl3-lmdeploy/lmdeploy/lmdeploy/pytorch/engine/mp_engine/zmq_engine.py", line 92, in _mp_proc engine = Engine.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^ File "/code/SFT/internvl3-lmdeploy/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 460, in from_pretrained return cls(model_path=pretrained_model_name_or_path, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/code/SFT/internvl3-lmdeploy/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 387, in init self.executor.init() File "/code/SFT/internvl3-lmdeploy/lmdeploy/lmdeploy/pytorch/engine/executor/base.py", line 224, in init self.build_model() File "/code/SFT/internvl3-lmdeploy/lmdeploy/lmdeploy/pytorch/engine/executor/uni_executor.py", line 53, in build_model self.model_agent.build_model() File "/code/SFT/internvl3-lmdeploy/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 987, in build_model self._build_model() File "/code/SFT/internvl3-lmdeploy/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 962, in _build_model need_output = self.dist_ctx.dp > 1 or self.dist_ctx.rank % self.dist_ctx.tp == 0 ^^^^^^^^^^^^^^^^ AttributeError: 'DistContext' object has no attribute 'dp' 重新拉去最新的，又出现这样的报错

Nov 15 '25 04:11 zzb213213

Please use this commit: 02cd79b

Nov 17 '25 11:11 lvhan028

请使用此提交：02cd79b 采用llama_factory微调qwen3-vl，llama_factory推理结果没问题，lmdeploy推理结果完全不正确，输出格式正确

Nov 20 '25 03:11 zzb213213