remove to restriction for 4-bit model
What does this PR do ?
Since bnb 0.43.0, you freely move bnb models across devices. This PR removes the restriction we put in place. Needs to be tested. cc @matthewdouglas
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
Thanks @SunMarc! I've tested moving between gpu->cpu->gpu, but not yet on multiple GPUs. We'll still see a warning from accelerate:
You shouldn't move a model that is dispatched using accelerate hooks.
Reference note: this should fix #24540 for 4bit.
For 8bit there is still a blocker: bitsandbytes-foundation/bitsandbytes#1332; once that's fixed & released on the bitsandbytes side we can do an additional PR.
Just to let you all know, this change breaks those using Quanto instead of BitsandBytes. Yeah, I know, Quanto seems to be the ugly duckling of quantizers. Before this update I could move my pipeline of of 'cuda' to the 'cpu'. When I changed my code to reflect this the 1st time I offload:
pipe = pipe.to(dtype=torch.float32,device='cpu', silence_dtype_warnings=True)
Then come back in to reuse my saved pipe:
pipe = FluxPipeline.from_pipe(pipe)
badabing badaboom!
I get this error:
So it seems you don't want me offloading a quantized file for quanto?