xFasterTransformer icon indicating copy to clipboard operation
xFasterTransformer copied to clipboard

Update AWQ GPTQ quantization guide

Open miaojinc opened this issue 1 year ago • 2 comments

update content related to Convert Python API, add quantization options.

miaojinc avatar Apr 10 '24 09:04 miaojinc

My concern is that the packages in requirements.txt may trigger some issues during security checking, let's target for next version.

pujiang2018 avatar Jun 04 '24 06:06 pujiang2018

My concern is that the packages in requirements.txt may trigger some issues during security checking, let's target for next version.

Yes, it has some potential issues. The reason of the packages version is because we lacks the group quantization operators, so we have to quantize the models on CPU according to our kernel. If we can align with autoawq and autogptq, we can load the quantized weights directly with out any modification for autoawq and autogptq.

miaojinc avatar Jun 04 '24 06:06 miaojinc

Closed, has been merged in https://github.com/intel/xFasterTransformer/commit/59a9430d4ee2ca99de4ca4ea78b9f3eba868e900.

Duyi-Wang avatar Sep 02 '24 08:09 Duyi-Wang