Is it possible to load 7b-it using quantization config

Open aliasneo1 opened this issue 1 year ago • 1 comments

Newbie here. 7b-it model could be loaded in a low memory device via quantization config without using quant version of model using BitsAndBytes like below in huggingface's AutoModelForCausalLM .

quantization_config = BitsAndBytesConfig( load_in_4bit = True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True, ) model = AutoModelForCausalLM.from_pretrained( "/kaggle/input/gemma/transformers/7b-it/2", device_map = "auto", trust_remote_code = True, quantization_config=quantization_config, ) Whether such type of loading is feasible in your current package?

Mar 18 '24 16:03 aliasneo1

Unfortunately, the current code doesn't support reading the quantization config specified using HuggingFace format.

It would require some amount of code changes to make it work. If you are under the mood, we definitely welcome such changes, and will help you land it.

Mar 26 '24 16:03 pengchongjin