BitNet icon indicating copy to clipboard operation
BitNet copied to clipboard

Issue with model size after replacing BitLinear layer into a HF model (say Llama2-7b-chat)[BUG]

Open mriganktiwari opened this issue 1 year ago • 3 comments

Describe the bug When I try to replace BitLinear layer into a HF model (say Llama2-7b-chat), the size is same for both though. Shouldn't size after replacing with BitLinear layer be reduced?

mriganktiwari avatar Mar 10 '24 02:03 mriganktiwari

Also when I use the HF model with replaced BitLinear layers, the generations isn't working.

  • The .generate with Llama2 model, completes generation in ~68 seconds
  • Whereas, doing same after replacing the BitLinear layers, keeps it running for eternity
model_name = "meta-llama/Llama-2-7b-hf" #"bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name, token='xxxx')
model = AutoModelForCausalLM.from_pretrained(model_name, token='xxxx')

text = "Tell me about Boxing day significance."
tokenizer.pad_token = tokenizer.eos_token
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)

start = time.time()
outputs = model.generate(inputs.input_ids, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(f"time for generation: {time.time() - start}")
replace_linears_in_hf(model)

start = time.time()
outputs = model.generate(inputs.input_ids, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(f"time for generation: {time.time() - start}")

mriganktiwari avatar Mar 11 '24 04:03 mriganktiwari

I had a quick look at this repo. In the current state of the code, it seems the binarized weights are still floats, which would explain your observation. Also it is still doing weights multiplication instead of some add / subtract, therefore not taking advantage of the replacement of the multiplication operator in bitnet1.58. This being said, performance wise (and potential bugs) apart, the results should be identical to bitnet1.58. Nice to see such attempts!

matkara avatar Mar 15 '24 12:03 matkara

Stale issue message

github-actions[bot] avatar May 15 '24 12:05 github-actions[bot]