bitsandbytes icon indicating copy to clipboard operation
bitsandbytes copied to clipboard

Dequantizing int8 models to fp16

Open raunaks13 opened this issue 1 year ago • 0 comments

I have loaded an LLM in huggingface with load_in_8bit=True. I noticed the objects in the state_dict are structured something like

  1. model.layers.18.self_attn.k_proj.weight
  2. model.layers.18.self_attn.k_proj.SCB
  3. model.layers.18.self_attn.k_proj.weight_format

The SCB and weight_format are present only in the quantized model. I think SCB refers to scale and bias that can help us in recreating the original tensor? weight_format is a string that says “row”. The huggingface integration guide mentions a .CB field in addition to the .SCB field, but I could not find it in the state_dict. Not sure if the codebase has changed since that was written?

Anyway, I am not sure about the exact method to dequantize the tensor to get back the original, but I tried the following: (weight_SCB.unsqueeze(1) * weight)/127 This is giving a tensor that is close to the original model (what I get without adding the parameter load_in_8bit=True), but not the same. I am not sure whether I am following the correct approach for dequantization. Would be great if someone could point me to some code or documentation on how I can recreate the exact original tensor from the weights.

As a follow up question, I know that for some models there are outlier values that are not quantized even though other values in the tensor are quantized. However I could not find this information in the state_dict. How can we find and handle these values during the dequantization process?

raunaks13 avatar Mar 22 '24 03:03 raunaks13