diffusers
diffusers copied to clipboard
How to use quantizer after pipeline loaded?
How to use quantizer after pipeline loaded?
- Currently
# Quantization occurs at load time.
pipe = QwenImagePipeline.from_pretrained(
(
args.model_path
if args.model_path is not None
else os.environ.get(
"QWEN_IMAGE_DIR",
"Qwen/Qwen-Image",
)
),
scheduler=scheduler,
torch_dtype=torch.bfloat16,
quantization_config=quantization_config,
)
- What i want
# Load on CPU -> Load and fuse lora -> quantize -> to GPU
Hi, the one you're using is at the pipeline level, what you need it's to use quantization for the modules, so you quantize the models before and then pass it to the pipeline, you can look at an example here
Depending on the lora (as in it doesn't have text encoder keys), you can just apply the lora to the transformer, fuse it and then quantize it. Another approach is to load the pipeline, load the lora, fuse it and the use the quantization with the pipe.transformer