Jincheng Miao

Results 3 issues of Jincheng Miao

The server CPU has many cores which can be used to quantize LLMs. This patch supports to quantize LLMs on CPU only with float data type.

update content related to Convert Python API, add quantization options.

documentation

invokeLayerLLaMA API enhancement: 1. Add bf16_int8 dtype support 2. Add kvcache dtype argument 3. Add Rope type argument