Fix QwenImage Prompt Embedding Padding for Deterministic Outputs

What was the issue?

Issue #12075 reported that QwenImage pipelines were producing non-deterministic outputs when using the same prompt across different batch sizes. The same text prompt would generate different images depending on whether it was batched alone or with other prompts of varying lengths.

This inconsistency violated a fundamental expectation: identical prompts with the same seed should always produce identical outputs, regardless of batch composition.

How I identified the problem

After reviewing the issue report and examining the QwenImage pipeline implementation, I discovered the root cause in the prompt embedding padding logic.

The pipelines were dynamically padding prompt embeddings to the maximum sequence length within each batch, rather than using a fixed padding length. This meant:

A short prompt batched alone would be padded to its own length
The same short prompt batched with a longer prompt would be padded to the longer prompt's length
Different padding created different Rotary Position Embedding (RoPE) position assignments
RoPE uses a shared position space for text and image tokens, so inconsistent text positions led to inconsistent image generation

The problem existed across all 8 QwenImage pipeline variants (main, img2img, inpaint, edit, edit_inpaint, edit_plus, controlnet, controlnet_inpaint) and the modular encoder functions.

How I solved it

The solution involved ensuring all prompt embeddings are padded to a consistent, fixed length determined by the max_sequence_length parameter (default 512, configurable up to the model's 1024 token limit).

I modified the padding logic in all affected locations to:

Use fixed-length padding: Changed from batch-maximum padding to max_sequence_length padding
Propagate the parameter: Added max_sequence_length parameter to all internal prompt encoding methods
Update RoPE sequence lengths: Changed txt_seq_lens to reflect the padded length instead of actual token counts
Handle vision tokens: Added truncation logic in Edit pipelines where image tokens are processed through the text encoder
Maintain backward compatibility: Kept the default max_sequence_length=512 to preserve existing behavior for users

The fix ensures that any prompt will always receive the same padding and RoPE positions, regardless of batch composition, making outputs deterministic and reproducible.

How the fix was tested

I created a comprehensive test test_prompt_embeds_padding() that verifies three critical behaviors:

Fixed padding to max_sequence_length: Confirms short prompts are padded to the full length (not just to their token count)
Batch consistency: Validates that prompts in a mixed-length batch all receive the same padding length
Custom max_sequence_length support: Tests that specifying a custom value (e.g., 512) correctly truncates and pads to that length

Additionally, I ran the entire QwenImage test suite to ensure no regressions were introduced. All structural tests pass successfully, with only expected value assertion changes (since fixing the padding changes the numerical outputs).

Test Results

Fixes : #12075 cc : @sayakpaul @yiyixuxu

Nov 12 '25 21:11 sambhavnoobcoder

can i get some initial reviews on this @sayakpaul ?

Nov 20 '25 23:11 sambhavnoobcoder

@yiyixuxu could you please give me an initial review on this PR ? I'll start making changes according to the same .

Nov 25 '25 18:11 sambhavnoobcoder

Hi @yiyixuxu @sayakpaul , I was wondering if i could get some initial comments and reviews on this please ? I'll make the necessary changes to this accordingly .

Dec 02 '25 12:12 sambhavnoobcoder

Fix qwen image prompt padding #12075

Fix QwenImage Prompt Embedding Padding for Deterministic Outputs

What was the issue?

How I identified the problem

How I solved it

How the fix was tested

Test Results