CTranslate2 support ChatGLM

ChatGLM is a popular ChatGPT-like model in Chinese: https://github.com/THUDM/ChatGLM-6B

Could ct2 support ChatGLM, and speed up the inference. Thanks a lot.

Apr 29 '23 12:04 nghuyong

+1

May 04 '23 10:05 BrightXiaoHan

Hi @nghuyong, I create a repo fast-chatglm. I wrote a script to convert ChatGLM based on the llama converter, but the decoding result is poor. It should be because some operators did not match up. If you're interested, we can work together to take a look.

May 17 '23 04:05 BrightXiaoHan

Hi, @guillaumekln. If you have time, could you please help take a look at where the problem is with the conversion script? We would greatly appreciate it. HF repo: https://huggingface.co/THUDM/chatglm-6b/tree/main

May 17 '23 04:05 BrightXiaoHan

coool, let me learn

May 17 '23 04:05 nghuyong

ChatGLM is not exactly a Llama model. There are several differences that are not (yet?) supported in CTranslate2:

position_encoding_2d is not implemented
the residual connection is different than other models: they add the layer norm output instead of the input and also apply a scale value

I'm also not sure about this PrefixEncoder, but it does not seem to be enabled for the current model.

May 17 '23 08:05 guillaumekln

The PrefixEncoder can be ignored, as it is only necessary when loading additional p-tuning parameters. Will ctranslate2 support position_encoding_2d in the foreseeable future?

May 17 '23 08:05 BrightXiaoHan