transformers save_pretrained 4-bit models with bitsandbytes

With the latest version of bitsandbytes (0.39.0) library, isn't it possible to serialize 4-bit models then?

Thus this section should be updated to allow the user to save these models. https://github.com/huggingface/transformers/blob/68d53bc7178866821282f45732c1e465f5160fa6/src/transformers/modeling_utils.py#LL1704C36-L1704C36

I'm mainly looking at this article to see what got produced lately in the area: https://huggingface.co/blog/4bit-transformers-bitsandbytes

I aim to quantize a model to 4-bit, enabling it to be used in e.g., GPT4all and other CPU platforms.

May 31 '23 13:05 westn

cc @younesbelkada

May 31 '23 13:05 sgugger

Hi @westn Thanks for the issue, I don't think 4bit models are serializable yet. Let me double check that with the author of bitsandbytes and get back to you

May 31 '23 15:05 younesbelkada

Also note that 4bit / 8bit is also applicable to GPU / CUDA devices, you cannot run quantized models with bitsandbytes on a CPU device

May 31 '23 15:05 younesbelkada

Hi @westn Currently it is not possible to save 4bit models but this is in the roadmap of bitsandbytes for the next releases. We will keep you posted!

Jun 01 '23 08:06 younesbelkada

Hi, @younesbelkada , Whether 4bits/8bits models can be saved now?

Jul 17 '23 08:07 jameswu2014

Hi @jameswu2014 Thanks for the heads up, currently 4bit saving is not possible, however, 8bit saving is possible, check: https://huggingface.co/docs/transformers/main_classes/quantization#push-quantized-models-on-the-hub

Jul 17 '23 08:07 younesbelkada

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Aug 10 '23 15:08 github-actions[bot]

See linked bnb issue: https://github.com/TimDettmers/bitsandbytes/issues/695

Aug 17 '23 09:08 younesbelkada

Bumping this since this is a top Google result for this topic, and I haven't found an answer elsewhere - is there any way to "de-quantize" a model which was trained in 4-bit or 8-bit (without a peft adapter) so it can be saved and loaded in bfloat16 or some other type? (This is currently a blocker to training checkpointing, which also means things like early stopping are not possible).

I tried basic things like model.to(torch.blfoat16) but it throws an error stating that bitsandbytes models can't be cast with to.

Aug 23 '23 12:08 nathan-az

Is there any update on this? @nathan-az @younesbelkada

Sep 27 '23 08:09 kbulutozler

hi @kbulutozler Yes, @poedator is working on it in https://github.com/huggingface/transformers/pull/26037 and the corresponding bnb PR is here: https://github.com/TimDettmers/bitsandbytes/pull/753

Sep 27 '23 08:09 younesbelkada