How to convert a finetuned MOSS model to quantized version model? 请问如何把一个finetune过的MOSS模型转换为量化版的模型呢？

Open qgpmztmf opened this issue 2 years ago • 3 comments

I couldn't find the code to release this process in this repository. Has anyone successfully converted a finetuned MOSS model to its quantized version? If so, could you please share the steps or code used to achieve this? 没找到实现这个过程的代码，有谁成功把finetune过的moss模型转换成量化版本的模型吗？

May 08 '23 06:05 qgpmztmf

May 11 '23 08:05 qgpmztmf

我测了他们的int4，发现量化后的还没有量化前的推理速度快。

May 16 '23 02:05 JIEKEXIAN

我测了他们的int4，发现量化后的还没有量化前的推理速度快。

量化并不一定会提速，量化主要是为了缩小模型占用显存。

May 31 '23 07:05 qgpmztmf