4bit quantization of the 10b model

Open phills11 opened this issue 3 years ago • 0 comments

Is there any way to run the 4bit quantization process from GLM-130b repo on the 10b GLM model?

https://github.com/THUDM/GLM-130B/blob/main/docs/quantization.md

Because the architecture is similar, I expect the idea can work. But currently, the frameworks seem incompatible. The GLM code base doesn't use SwissArmyTransformer framework and the quantization scripts require it. Are there any small hacks I can make to fit them together?

If one can fit within 8gb of VRAM that'll make the GLM 10b models a lot more accessible for end user experimentation. Thanks!

Jan 09 '23 03:01 phills11