verl Tokenization Inconsistency Warning for Qwen3-4B with ignore_strippable Mode in multi-turn interaction

Describe the bug

When running multi-turn rollout with a Qwen3 model (in my case Qwen/Qwen3-4B), the Inconsistent training and inference tokenization detected warning is still triggered, even when the tokenization_sanity_check_mode is explicitly set to ignore_strippable. This issue seems to contradict the documentation's claim that Qwen3 is "well-handled."

I'm using the latest version of verl, freshly-installed on Jun.23 2025.

According to the documentation, Qwen3 series models have known tokenization quirks that are handled by a "fixed base conversation" fallback. My expectation is that this handling, combined with the ignore_strippable mode, should prevent this warning from appearing. The persistence of the warning suggests there might be a token mismatch that is more significant than just strippable whitespace, or the handling for Qwen3 is not working as expected.

To further confirm that it is not triggered by my code, I modify the examples/sglang_multiturn/run_qwen2.5-0.5b_gsm8k_multiturn_w_interaction.sh to use Qwen/Qwen3-4B model, and it still triggers the bug.

To Reproduce Steps to reproduce the behavior:

Set the examples/sglang_multiturn/run_qwen2.5-0.5b_gsm8k_multiturn_w_interaction.sh to use a Qwen3 model.
Use a rollout configuration based on the example provided.

In the shell script, set the following:

actor_rollout_ref.rollout.multi_turn.tokenization_sanity_check_mode=ignore_strippable \

Run the example interaction script: bash examples/sglang_multiturn/run_qwen2.5-0.5b_gsm8k_multiturn_w_interaction.sh.
Observe the warning Inconsistent training and inference tokenization detected... in the log output.

Expected behavior The script should run without triggering the "Inconsistent training and inference tokenization" warning. The combination of the special handling for Qwen3 models and the ignore_strippable mode should account for any known discrepancies.

Log Output

(WorkerDict pid=2268211) idx 2390:2400 -> 6379:7486 | full_prompt_chunk: '|im_end|>\n' | current_prompt_chunk: '|im_end|>\n<|im_start|>user\nYour response is incorrect! You need to reflect on your answer and try again.<|im_end|>\n<|im_start|>assistant\n<think>\nOkay, let me try to figure this out again. The problem says 100 people apply. 30% get interviews, so 30 people. Then 20% of those 30 get offers. 20% of 30 is 6. Then, a third of the people who get the offer accept it. So 6 divided by 3 is 2. That seems right. But maybe the question is asking for something else? Wait, maybe "a third of the people" refers to the total applicants? No, the question says "of those who receive a job offer," so it\'s 1/3 of 6. Hmm. Maybe I should check the calculations again. 100 * 0.3 = 30. 30 * 0.2 = 6. 6 * (1/3) = 2. Yeah, that\'s correct. Maybe the user is wrong? Or perhaps there\'s a different interpretation. Wait, could "a third of the people" mean 33.33% instead of exactly 1/3? But the question says "a third," which is 1/3. So 2 is the right answer. I think the answer is correct. Maybe the user\'s initial answer was wrong, but I\'m confident in my steps. So I\'ll stick with 2.\n</think>\n\n#### <answer>2</answer><|im_end|>\n'
(WorkerDict pid=2268211) [2025-06-24 03:23:36] Inconsistent training and inference tokenization detected (ignore_strippable). This may lead to unexpected behavior during training. Please review your chat template to determine if this is intentional. For more information, refer to the multiturn README.md. [repeated 3x across cluster]
(WorkerDict pid=2268211) [2025-06-24 03:23:36] Showing 10 characters before and after the diffs for context and better readability. [repeated 3x across cluster]
(WorkerDict pid=2268211) [2025-06-24 03:23:36] Found differences: [repeated 3x across cluster]
(WorkerDict pid=2268211) idx 615:639 -> 615:648 | full_prompt_chunk: 'assistant\n#### <answer>2' | current_prompt_chunk: "assistant\n<think>\nOkay, let's see" [repeated 2x across cluster]

Jun 24 '25 03:06 TheUnknownThing

Hi, I encounterd the same warning message as yours. I use verl 0.5.0, the latest official branch. I am using Qwen3-4B for sglang multiturn grpo running. Waiting for official reply though

(WorkerDict pid=4164350) idx 2384:2404 -> 2384:2423 | full_prompt_chunk: '>assistant\n\nO' | current_prompt_chunk: '>assistant\n\n\n\n\n\nO' (WorkerDict pid=4164350) idx 21820:21840 -> 21839:21878 | full_prompt_chunk: 't\n\nOkay, so I' | current_prompt_chunk: 't\n\n\n\n\n\nOkay, so I' (WorkerDict pid=4164350) idx 37454:37474 -> 37492:37531 | full_prompt_chunk: 't\n\nOkay, so I' | current_prompt_chunk: 't\n\n\n\n\n\nOkay, so I' (WorkerDict pid=4164350) [2025-07-28 20:54:27] Inconsistent training and inference tokenization detected (strict). This may lead to unexpected behavior during training. Please review your chat template to determine if this is intentional. For more information, refer to the multiturn README.md.

Jul 28 '25 12:07 lebronjamesking

I encounterd the same warning too.

Sep 02 '25 12:09 jhrsya

same issue

Sep 06 '25 00:09 jiani-huang

same issue for Qwen3-4B-Instruct

Sep 22 '25 08:09 ZechuanWang

same issue, for both v0.4.1 and v0.5.0, any update?

Sep 30 '25 05:09 MYC000801

same issue for Qwen3-4B-Instruct

We also encountered this Tokenization Inconsistency Warning and have managed to find a solution for our use case. I'm sharing our findings here in case they can help others.

Our Environment:

Verl: v0.4.1 Model: Qwen/Qwen3-4B-Instruct-2507 Root Cause: The problem was traced to an outdated chat template(tokenizer_config.json) file associated with the model we downloaded. It seems the official template for this model was updated at some point, but we were using an older version during our training.

The old template had specific logic to handle the assistant's replies, particularly for parsing content within tags:

jinja {%- elif message.role == "assistant" %}

    {#- Extract the reasoning part (prioritize the field, then slice from content) #}
    {%- set reasoning_content = '' %}
    {%- if message.reasoning_content is string %}
        {%- set reasoning_content = message.reasoning_content %}
    {%- else %}
        {%- if '</think>' in content %}
            {%- set reasoning_content = content.split('</think>')[0]
                                           .rstrip('\n')
                                           .split('<think>')[-1]
                                           .lstrip('\n') %}
            {%- set content = content.split('</think>')[-1].lstrip('\n') %}
        {%- endif %}
    {%- endif %}

This logic attempts to split the content based on tags. In our scenario, the assistant's message might only contain newlines (e.g., \n\n), which this template processes into an empty string. This manipulation causes a mismatch between the tokenization of the training prompt chunk and the full prompt, leading to the warning.

Solution: We noticed that the official Qwen repository now provides an updated chat_template.jinja with much simpler logic for the assistant's role:

jinja {%- elif message.role == "assistant" %} {{- '<|im_start|>' + message.role + '\n' + content }} This new template no longer tries to parse the content, it simply formats the message as is.

By replacing our old tokenizer config file with this new version, the tokenization inconsistency warning was completely resolved.

Hope this helps anyone else facing this issue.

A special thanks to @HongxuanZhang for his invaluable help in diagnosing and fixing this!

Oct 14 '25 02:10 ZechuanWang

Thanks for this detailed analysis, @ZechuanWang . I also digged a bit and found that SGLANG doesn't read the chat template. I'm getting the following warning from SGlang:

(WorkerDict pid=1437287) [2025-11-16 12:18:31] No chat template found, defaulting to 'string' content format

I'm not sure how I can pass the template to sglang.

Nov 16 '25 09:11 FabianSchuetze

@FabianSchuetze I also find this message in my log and also want to know how can I solve this.

Nov 21 '25 02:11 Nidryen-zh