Jincheng Miao
Results
3
issues of
Jincheng Miao
The server CPU has many cores which can be used to quantize LLMs. This patch supports to quantize LLMs on CPU only with float data type.
update content related to Convert Python API, add quantization options.
documentation
invokeLayerLLaMA API enhancement: 1. Add bf16_int8 dtype support 2. Add kvcache dtype argument 3. Add Rope type argument