BitNet Converting existing models

Amazing work and fantastic resource, thanks for sharing your work - this should jump start usage of llm on low resource devices.

Quick question - is there a guide to convert existing models to bitnet compliant format?!

Oct 19 '24 23:10 virentakia

i tried gemma-2-27b, gemma-2-9B, and many others all worked fine no errors encountered till now, although their 1 bit quants were a lot hallucinating

Oct 21 '24 06:10 Dead-Bytes

Unfortunately, no. If a model's weight parameters are not natively ternary, using the conversion function will result in the loss of weight values, leading to inaccurate results. We encourage more training of 1-bit models from scratch.

Oct 21 '24 08:10 dawnmsg

@Dead-Bytes By 'tried Gemma-2-27B ' do you mean that you performed QAT from scratch? How did you quantize the Gemma-2 models?

Oct 21 '24 09:10 sean-jang00

No i did not performed QAT from Scratch, i used the present I_2S quants available for models, they show degrading performance this need to be researched to get a way out of it, however they are without ternary wieghts in 1 bit quants. They are working fine on my octacore cpu, human readable 7 tokens per second.

Oct 21 '24 09:10 Dead-Bytes

@dawnmsg Would training a 70B model from scratch with 1-bit precision require fewer resources than training with full precision? If similar resources are needed, would general developers still be able to perform QAT for a 70B model?

Oct 21 '24 09:10 sean-jang00

I think it should be @ qwen2.5 developers. I hope to get a 1-bit qwen2.5 model, and I estimate they also want it.

I don't know how to train from scratch, and I estimate I won't be able to afford such financial expenses, even if I use the cloud.

We need a foundation

Oct 22 '24 02:10 Deng-Xian-Sheng

Isn't this what HF1BitLLM/Llama3-8B-1.58-100B-tokens did though? They started with a base model and fine tuned it via this method: https://huggingface.co/blog/1_58_llm_extreme_quantization

This suggests that fine-tuning the model in low-bit mode on a specific dataset causes it to lose much of its general knowledge

While pre-training models in 1.58 bits is resource-intensive, we’ve demonstrated that, with some tricks, it’s possible to fine-tune existing models to this precision level, achieving efficient performance without sacrificing accuracy.

So it can be done, but it sacrifices original model data and you need to top it up with additional datasets, so possible but not perfect.

Training it from scratch would avoid this.

Nov 07 '24 20:11 grctest

no help

----- 原始邮件 -----

@.***>

@.***>等3人

主题：Re: [microsoft/BitNet] Converting existing models (Issue #40)

日期：2024年11月08日 04:24:53

Isn't this what HF1BitLLM/Llama3-8B-1.58-100B-tokens did though? They started with a base model and fine tuned it via this method: https://huggingface.co/blog/1_58_llm_extreme_quantization

This suggests that fine-tuning the model in low-bit mode on a specific dataset causes it to lose much of its general knowledge

While pre-training models in 1.58 bits is resource-intensive, we’ve demonstrated that, with some tricks, it’s possible to fine-tune existing models to this precision level, achieving efficient performance without sacrificing accuracy.

So it can be done, but it sacrifices original model data and you need to top it up with additional datasets, so possible but not perfect. Training it from scratch would avoid this. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

Nov 07 '24 21:11 Deng-Xian-Sheng

I have just discovered bitnet. Before this, I was using models in gguf format. But they were below my expectations. Now I want to run the models I trained on bitnet. But I don't know how to do it. I currently have a model in gguf format like this: "falan42/llama_lora_8b_medical_parallax_2_gguf"

Dec 09 '24 05:12 3m1rc1kk