transformers icon indicating copy to clipboard operation
transformers copied to clipboard

remove to restriction for 4-bit model

Open SunMarc opened this issue 1 year ago • 3 comments

What does this PR do ?

Since bnb 0.43.0, you freely move bnb models across devices. This PR removes the restriction we put in place. Needs to be tested. cc @matthewdouglas

SunMarc avatar Aug 26 '24 13:08 SunMarc

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Thanks @SunMarc! I've tested moving between gpu->cpu->gpu, but not yet on multiple GPUs. We'll still see a warning from accelerate:

You shouldn't move a model that is dispatched using accelerate hooks.

matthewdouglas avatar Aug 26 '24 15:08 matthewdouglas

Reference note: this should fix #24540 for 4bit.

For 8bit there is still a blocker: bitsandbytes-foundation/bitsandbytes#1332; once that's fixed & released on the bitsandbytes side we can do an additional PR.

matthewdouglas avatar Aug 26 '24 16:08 matthewdouglas

Just to let you all know, this change breaks those using Quanto instead of BitsandBytes. Yeah, I know, Quanto seems to be the ugly duckling of quantizers. Before this update I could move my pipeline of of 'cuda' to the 'cpu'. When I changed my code to reflect this the 1st time I offload: pipe = pipe.to(dtype=torch.float32,device='cpu', silence_dtype_warnings=True)

Then come back in to reuse my saved pipe: pipe = FluxPipeline.from_pipe(pipe) badabing badaboom!

I get this error: spxEEC9 So it seems you don't want me offloading a quantized file for quanto?

ukaprch avatar Feb 28 '25 20:02 ukaprch