[BUG/Help] <Help Need for quantize the model>

Open oddwatcher opened this issue 3 years ago • 1 comments

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

I want to run this on a R7-6800H laptop using CPU however I do not have a NVIDIA card to run the quantize python code. can someone provide me a int8 or 16 version of quantized model or give me some instructions on how to write one to let CPU to do the work? P.S Should the model works in LLaMa.cpp?

Expected Behavior

No response

Steps To Reproduce

have a laptop with out NVIDIA GPU and run the quantize python code.

Environment

- OS:windows 11  
- Python:3.10.9
- Transformers:according to requirements.txt
- PyTorch:requirements.txt
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :False

Anything else?

No response

Mar 19 '23 16:03 oddwatcher

This might be what you are looking for: https://huggingface.co/THUDM/chatglm-6b-int4 Load it on CPU: model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4",trust_remote_code=True).float() 8G RAM is required.

Mar 22 '23 09:03 songxxzp