fix: qwen3 nonstream parse with no or uncompleted think content
Motivation
Behaviors of qwen3 reasoning parser are fixed:
- In non stream mode, when thinking is disabled, output is returned as reasoning_content.
- In non stream mode, when thinking content is larger than
max_tokens, parser cannot handle output without</think>token. (Mentioned in issue #3664)
Modification
-
qwen_qwq_reasoning_parser.py- When
stream=Falseandenable_thinking=False, there's no<think>tag, so the whole model output should be regarded as content, not reasoning_content. - When
stream=False,enable_thinking=Trueandmax_tokenis a small value, incomplete think content can be correctly parsed.
- When
-
test_qwen3_parser.py- relative test cases added
Checklist
- Pre-commit or other linting tools are used to fix the potential lint issues.
- The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
- If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
- The documentation has been modified accordingly, like docstring or example tutorials.
May provide reproducing code
@RunningLeon Is it fixed when you supported interns1 reasoning parser?
@RunningLeon Is it fixed when you supported interns1 reasoning parser?
The first problem in the below should be fixed. @ywx217 hi, as for the second one, does a user really need the uncomplete thinking_content in the case? If reasonable, this would be included.
- When
stream=Falseandenable_thinking=False, there's no<think>tag, so the whole model output should be regarded as content, not reasoning_content. - Whenstream=False,enable_thinking=Trueandmax_tokenis a small value, incomplete think content can be correctly parsed.
@RunningLeon Is it fixed when you supported interns1 reasoning parser?
The first problem in the below should be fixed. @ywx217 hi, as for the second one, does a user really need the uncomplete thinking_content in the case? If reasonable, this would be included.
- When
stream=Falseandenable_thinking=False, there's no<think>tag, so the whole model output should be regarded as content, not reasoning_content.
- When
stream=False,enable_thinking=Trueandmax_tokenis a small value, incomplete think content can be correctly parsed.
The second case comes from the related issue (#3664), in my view, this case is rare.