lmdeploy icon indicating copy to clipboard operation
lmdeploy copied to clipboard

fix: qwen3 nonstream parse with no or uncompleted think content

Open ywx217 opened this issue 6 months ago • 4 comments

Motivation

Behaviors of qwen3 reasoning parser are fixed:

  • In non stream mode, when thinking is disabled, output is returned as reasoning_content.
  • In non stream mode, when thinking content is larger than max_tokens, parser cannot handle output without </think> token. (Mentioned in issue #3664)

Modification

  • qwen_qwq_reasoning_parser.py
    • When stream=False and enable_thinking=False, there's no <think> tag, so the whole model output should be regarded as content, not reasoning_content.
    • When stream=False, enable_thinking=True and max_token is a small value, incomplete think content can be correctly parsed.
  • test_qwen3_parser.py
    • relative test cases added

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues.
  2. The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
  3. If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
  4. The documentation has been modified accordingly, like docstring or example tutorials.

ywx217 avatar Jul 18 '25 09:07 ywx217

May provide reproducing code

lvhan028 avatar Jul 22 '25 07:07 lvhan028

@RunningLeon Is it fixed when you supported interns1 reasoning parser?

lvhan028 avatar Aug 06 '25 06:08 lvhan028

@RunningLeon Is it fixed when you supported interns1 reasoning parser?

The first problem in the below should be fixed. @ywx217 hi, as for the second one, does a user really need the uncomplete thinking_content in the case? If reasonable, this would be included.

  • When stream=False and enable_thinking=False, there's no <think> tag, so the whole model output should be regarded as content, not reasoning_content. - When stream=False, enable_thinking=True and max_token is a small value, incomplete think content can be correctly parsed.

RunningLeon avatar Aug 07 '25 02:08 RunningLeon

@RunningLeon Is it fixed when you supported interns1 reasoning parser?

The first problem in the below should be fixed. @ywx217 hi, as for the second one, does a user really need the uncomplete thinking_content in the case? If reasonable, this would be included.

  • When stream=False and enable_thinking=False, there's no <think> tag, so the whole model output should be regarded as content, not reasoning_content.
    • When stream=False, enable_thinking=True and max_token is a small value, incomplete think content can be correctly parsed.

The second case comes from the related issue (#3664), in my view, this case is rare.

ywx217 avatar Sep 01 '25 01:09 ywx217