lmdeploy fix: qwen3 nonstream parse with no or uncompleted think content

Motivation

Behaviors of qwen3 reasoning parser are fixed:

In non stream mode, when thinking is disabled, output is returned as reasoning_content.
In non stream mode, when thinking content is larger than max_tokens, parser cannot handle output without </think> token. (Mentioned in issue #3664)

Modification

qwen_qwq_reasoning_parser.py
- When stream=False and enable_thinking=False, there's no <think> tag, so the whole model output should be regarded as content, not reasoning_content.
- When stream=False, enable_thinking=True and max_token is a small value, incomplete think content can be correctly parsed.
test_qwen3_parser.py
- relative test cases added

Checklist

Pre-commit or other linting tools are used to fix the potential lint issues.
The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
The documentation has been modified accordingly, like docstring or example tutorials.

Jul 18 '25 09:07 ywx217

May provide reproducing code

Jul 22 '25 07:07 lvhan028

@RunningLeon Is it fixed when you supported interns1 reasoning parser?

Aug 06 '25 06:08 lvhan028

@RunningLeon Is it fixed when you supported interns1 reasoning parser?

The first problem in the below should be fixed. @ywx217 hi, as for the second one, does a user really need the uncomplete thinking_content in the case? If reasonable, this would be included.

When stream=False and enable_thinking=False, there's no <think> tag, so the whole model output should be regarded as content, not reasoning_content. - When stream=False, enable_thinking=True and max_token is a small value, incomplete think content can be correctly parsed.

Aug 07 '25 02:08 RunningLeon

@RunningLeon Is it fixed when you supported interns1 reasoning parser?

The first problem in the below should be fixed. @ywx217 hi, as for the second one, does a user really need the uncomplete thinking_content in the case? If reasonable, this would be included.

When stream=False and enable_thinking=False, there's no <think> tag, so the whole model output should be regarded as content, not reasoning_content.

When stream=False, enable_thinking=True and max_token is a small value, incomplete think content can be correctly parsed.

The second case comes from the related issue (#3664), in my view, this case is rare.

Sep 01 '25 01:09 ywx217