How to use quantizer after pipeline loaded?

Open DefTruth opened this issue 2 months ago • 1 comments

Currently

# Quantization occurs at load time.
pipe = QwenImagePipeline.from_pretrained(
    (
        args.model_path
        if args.model_path is not None
        else os.environ.get(
            "QWEN_IMAGE_DIR",
            "Qwen/Qwen-Image",
        )
    ),
    scheduler=scheduler,
    torch_dtype=torch.bfloat16,
    quantization_config=quantization_config,
)

What i want

# Load on CPU -> Load and fuse lora -> quantize -> to GPU

Dec 11 '25 06:12 DefTruth

Hi, the one you're using is at the pipeline level, what you need it's to use quantization for the modules, so you quantize the models before and then pass it to the pipeline, you can look at an example here

Depending on the lora (as in it doesn't have text encoder keys), you can just apply the lora to the transformer, fuse it and then quantize it. Another approach is to load the pipeline, load the lora, fuse it and the use the quantization with the pipe.transformer

Dec 11 '25 14:12 asomoza