MOSS
MOSS copied to clipboard
How to convert a finetuned MOSS model to quantized version model? 请问如何把一个finetune过的MOSS模型转换为量化版的模型呢?
I couldn't find the code to release this process in this repository. Has anyone successfully converted a finetuned MOSS model to its quantized version? If so, could you please share the steps or code used to achieve this? 没找到实现这个过程的代码,有谁成功把finetune过的moss模型转换成量化版本的模型吗?
.
我测了他们的int4,发现量化后的还没有量化前的推理速度快。
我测了他们的int4,发现量化后的还没有量化前的推理速度快。
量化并不一定会提速,量化主要是为了缩小模型占用显存。