save_pretrained 4-bit models with bitsandbytes
With the latest version of bitsandbytes (0.39.0) library, isn't it possible to serialize 4-bit models then?
Thus this section should be updated to allow the user to save these models. https://github.com/huggingface/transformers/blob/68d53bc7178866821282f45732c1e465f5160fa6/src/transformers/modeling_utils.py#LL1704C36-L1704C36
I'm mainly looking at this article to see what got produced lately in the area: https://huggingface.co/blog/4bit-transformers-bitsandbytes
I aim to quantize a model to 4-bit, enabling it to be used in e.g., GPT4all and other CPU platforms.
cc @younesbelkada
Hi @westn Thanks for the issue, I don't think 4bit models are serializable yet. Let me double check that with the author of bitsandbytes and get back to you
Also note that 4bit / 8bit is also applicable to GPU / CUDA devices, you cannot run quantized models with bitsandbytes on a CPU device
Hi @westn Currently it is not possible to save 4bit models but this is in the roadmap of bitsandbytes for the next releases. We will keep you posted!
Hi, @younesbelkada , Whether 4bits/8bits models can be saved now?
Hi @jameswu2014 Thanks for the heads up, currently 4bit saving is not possible, however, 8bit saving is possible, check: https://huggingface.co/docs/transformers/main_classes/quantization#push-quantized-models-on-the-hub
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
See linked bnb issue: https://github.com/TimDettmers/bitsandbytes/issues/695
Bumping this since this is a top Google result for this topic, and I haven't found an answer elsewhere - is there any way to "de-quantize" a model which was trained in 4-bit or 8-bit (without a peft adapter) so it can be saved and loaded in bfloat16 or some other type? (This is currently a blocker to training checkpointing, which also means things like early stopping are not possible).
I tried basic things like model.to(torch.blfoat16) but it throws an error stating that bitsandbytes models can't be cast with to.
Is there any update on this? @nathan-az @younesbelkada
hi @kbulutozler Yes, @poedator is working on it in https://github.com/huggingface/transformers/pull/26037 and the corresponding bnb PR is here: https://github.com/TimDettmers/bitsandbytes/pull/753