gemma_pytorch icon indicating copy to clipboard operation
gemma_pytorch copied to clipboard

Is it possible to load 7b-it using quantization config

Open aliasneo1 opened this issue 1 year ago • 1 comments

Newbie here. 7b-it model could be loaded in a low memory device via quantization config without using quant version of model using BitsAndBytes like below in huggingface's AutoModelForCausalLM .

quantization_config = BitsAndBytesConfig( load_in_4bit = True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True, ) model = AutoModelForCausalLM.from_pretrained( "/kaggle/input/gemma/transformers/7b-it/2", device_map = "auto", trust_remote_code = True, quantization_config=quantization_config, ) Whether such type of loading is feasible in your current package?

aliasneo1 avatar Mar 18 '24 16:03 aliasneo1

Unfortunately, the current code doesn't support reading the quantization config specified using HuggingFace format.

It would require some amount of code changes to make it work. If you are under the mood, we definitely welcome such changes, and will help you land it.

pengchongjin avatar Mar 26 '24 16:03 pengchongjin