diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

How to use quantizer after pipeline loaded?

Open DefTruth opened this issue 2 months ago • 1 comments

How to use quantizer after pipeline loaded?

  • Currently
# Quantization occurs at load time.
pipe = QwenImagePipeline.from_pretrained(
    (
        args.model_path
        if args.model_path is not None
        else os.environ.get(
            "QWEN_IMAGE_DIR",
            "Qwen/Qwen-Image",
        )
    ),
    scheduler=scheduler,
    torch_dtype=torch.bfloat16,
    quantization_config=quantization_config,
)
  • What i want
# Load on CPU -> Load and fuse lora -> quantize -> to GPU

DefTruth avatar Dec 11 '25 06:12 DefTruth

Hi, the one you're using is at the pipeline level, what you need it's to use quantization for the modules, so you quantize the models before and then pass it to the pipeline, you can look at an example here

Depending on the lora (as in it doesn't have text encoder keys), you can just apply the lora to the transformer, fuse it and then quantize it. Another approach is to load the pipeline, load the lora, fuse it and the use the quantization with the pipe.transformer

asomoza avatar Dec 11 '25 14:12 asomoza