lmdeploy [Bug] Qwen3的enable_thinking参数目前不支持

Checklist

[ ] 1. I have searched related issues but cannot get the expected help.
[ ] 2. The bug has not been fixed in the latest version.
[ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

Qwen3系列模型支持开关控制是否启用think，如："chat_template_kwargs": {"enable_thinking": false}，但是目前lmdeploy版本还不支持。

Reproduction

调用如下：

curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "Qwen/Qwen3-8B", "messages": [ {"role": "user", "content": "Give me a short introduction to large language models."} ], "temperature": 0.7, "top_p": 0.8, "max_tokens": 1024, "chat_template_kwargs": {"enable_thinking": false} }'

Environment

lmdeploy 0.7.3
transformers 4.51.3

Error traceback

Apr 30 '25 05:04 nemo326

This is a widely desired feature after the release of Qwen3, hoping to support the 'enable_thinking' parameter.

Apr 30 '25 05:04 BUJIDAOVS

May 06 '25 12:05 xusk

I dont think this is a acceptably altinative way to solve the "enable_thinking" issue, the response still has the prefix '''<'think'>\n\n</'think'>\n\n'''. which definitely increases the time of generating. If I am building a restricted-time-cost recommended system/agent rely on the qwen3, the 'enable_thinking' parameter is necessary.

May 07 '25 06:05 HaotianHu

The purpose of Qwen3's enable_thinking is to occupy the in the model's response of chat_template, making the model believe that it has already finished thinking and thus preventing it from further thinking.

May 12 '25 06:05 xycdaimi

supported in #3641

Jun 13 '25 06:06 lvhan028