Failed to load lora in int8 mode

Open babyta opened this issue 8 months ago • 1 comments

Let me introduce the background. The fill model changes the picture background. Torchao quantization is used for inference, but it fails when loading Lora. If Lora is not quantized, it can be loaded.

`transformer = FluxTransformer2DModel.from_pretrained( model_path , subfolder = "transformer" , torch_dtype = torch.bfloat16 ) quantize_device = DEVICE_ID #int8方式加载 quantize_( transformer, int8_weight_only(), device = quantize_device # quantize using GPU to accelerate the speed ) #fp8方式加载 # quantize_( # transformer, # float8_weight_only(), # device = quantize_device # quantize using GPU to accelerate the speed # ) self.pipe = FluxFillPipeline.from_pretrained( model_path, transformer = transformer, torch_dtype = torch.bfloat16 )

    if is_add_loramodel:
        self.pipe.load_lora_weights("/Flux-Midjourney-Mix2-LoRA/", weight_name="mjV6.safetensors")
        self.pipe.fuse_lora(lora_scale=1.2)
    #self.pipe.to("cuda:0")  #速度快耗费显存全流程不到10s，节省内存 全模型常驻GPU
    self.pipe.enable_model_cpu_offload(gpu_id = pipe_gpu_id)#速度慢耗费内存，节省显存空载几乎不消耗显存。`

TypeError: TorchaoLoraLinear.init() missing 1 required keyword-only argument: 'get_apply_tensor_subclass'

Jun 19 '25 06:06 babyta

My requirements are mainly the following: 1-Fill model call 2-Quantized inference, saving video memory, preferably less than 20g, with 1024 resolution 3-Be able to mount a lora, because the details of Beijing in flux raw images are poor

Jun 19 '25 06:06 babyta