support ChatGLM
ChatGLM is a popular ChatGPT-like model in Chinese: https://github.com/THUDM/ChatGLM-6B
Could ct2 support ChatGLM, and speed up the inference. Thanks a lot.
+1
Hi @nghuyong, I create a repo fast-chatglm. I wrote a script to convert ChatGLM based on the llama converter, but the decoding result is poor. It should be because some operators did not match up. If you're interested, we can work together to take a look.
Hi, @guillaumekln. If you have time, could you please help take a look at where the problem is with the conversion script? We would greatly appreciate it. HF repo: https://huggingface.co/THUDM/chatglm-6b/tree/main
coool, let me learn
ChatGLM is not exactly a Llama model. There are several differences that are not (yet?) supported in CTranslate2:
-
position_encoding_2dis not implemented - the residual connection is different than other models: they add the layer norm output instead of the input and also apply a scale value
I'm also not sure about this PrefixEncoder, but it does not seem to be enabled for the current model.
The PrefixEncoder can be ignored, as it is only necessary when loading additional p-tuning parameters. Will ctranslate2 support position_encoding_2d in the foreseeable future?