[Bug] Qwen3的enable_thinking参数目前不支持
Checklist
- [ ] 1. I have searched related issues but cannot get the expected help.
- [ ] 2. The bug has not been fixed in the latest version.
- [ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
Describe the bug
Qwen3系列模型支持开关控制是否启用think,如:"chat_template_kwargs": {"enable_thinking": false},但是目前lmdeploy版本还不支持。
Reproduction
调用如下:
curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "Qwen/Qwen3-8B", "messages": [ {"role": "user", "content": "Give me a short introduction to large language models."} ], "temperature": 0.7, "top_p": 0.8, "max_tokens": 1024, "chat_template_kwargs": {"enable_thinking": false} }'
Environment
lmdeploy 0.7.3
transformers 4.51.3
Error traceback
This is a widely desired feature after the release of Qwen3, hoping to support the 'enable_thinking' parameter.
I dont think this is a acceptably altinative way to solve the "enable_thinking" issue, the response still has the prefix '''<'think'>\n\n</'think'>\n\n'''. which definitely increases the time of generating. If I am building a restricted-time-cost recommended system/agent rely on the qwen3, the 'enable_thinking' parameter is necessary.
The purpose of Qwen3's enable_thinking is to occupy the
supported in #3641