xFasterTransformer Update AWQ GPTQ quantization guide

update content related to Convert Python API, add quantization options.

Apr 10 '24 09:04 miaojinc

My concern is that the packages in requirements.txt may trigger some issues during security checking, let's target for next version.

Jun 04 '24 06:06 pujiang2018

My concern is that the packages in requirements.txt may trigger some issues during security checking, let's target for next version.

Yes, it has some potential issues. The reason of the packages version is because we lacks the group quantization operators, so we have to quantize the models on CPU according to our kernel. If we can align with autoawq and autogptq, we can load the quantized weights directly with out any modification for autoawq and autogptq.

Jun 04 '24 06:06 miaojinc

Closed, has been merged in https://github.com/intel/xFasterTransformer/commit/59a9430d4ee2ca99de4ca4ea78b9f3eba868e900.

Sep 02 '24 08:09 Duyi-Wang