remove to restriction for 4-bit model

Open SunMarc opened this issue 1 year ago • 3 comments

What does this PR do ?

Since bnb 0.43.0, you freely move bnb models across devices. This PR removes the restriction we put in place. Needs to be tested. cc @matthewdouglas

Aug 26 '24 13:08 SunMarc

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Aug 26 '24 13:08 HuggingFaceDocBuilderDev

Thanks @SunMarc! I've tested moving between gpu->cpu->gpu, but not yet on multiple GPUs. We'll still see a warning from accelerate:

You shouldn't move a model that is dispatched using accelerate hooks.

Aug 26 '24 15:08 matthewdouglas

Reference note: this should fix #24540 for 4bit.

For 8bit there is still a blocker: bitsandbytes-foundation/bitsandbytes#1332; once that's fixed & released on the bitsandbytes side we can do an additional PR.

Aug 26 '24 16:08 matthewdouglas

Just to let you all know, this change breaks those using Quanto instead of BitsandBytes. Yeah, I know, Quanto seems to be the ugly duckling of quantizers. Before this update I could move my pipeline of of 'cuda' to the 'cpu'. When I changed my code to reflect this the 1st time I offload: pipe = pipe.to(dtype=torch.float32,device='cpu', silence_dtype_warnings=True)

Then come back in to reuse my saved pipe: pipe = FluxPipeline.from_pipe(pipe) badabing badaboom!

I get this error: spxEEC9 So it seems you don't want me offloading a quantized file for quanto?

Feb 28 '25 20:02 ukaprch