chatglm.cpp
chatglm.cpp copied to clipboard
130B
Can this work with the GLm130B model? https://github.com/THUDM/GLM-130B
Probably not. At least not for now. It'll be extremely slow on CPU, and it's too large to fit into a single GPU even A100-80GB. Need to support tensor parallelism and it's a lot of work.