auto-round icon indicating copy to clipboard operation
auto-round copied to clipboard

[query] is the int symmetric quantisation only for unsigned int?

Open EricLiclair opened this issue 1 year ago • 1 comments

based on the implementations here - https://github.com/intel/auto-round/blob/9718d20333e4448b7cda96074ef481668d19c861/auto_round/data_type/int.py#L70

and my experiments at - https://colab.research.google.com/drive/1rjfaNl8B_9sQMMYXupk4DWtxwhR6LNpx?usp=sharing

it seems that the symmetric int quantization is intentionally only for unsigned int, am i correct on this? based on my understanding, the expected zero-point for symmetric int quantization is "0"; which in the current case is not so. currently, zp=8 if bits=4, and zp=128 if bits=8, etc. which reflects a case where the target data type could be unsigned int.

i wanted to know what int types (signed or unsigned) are generally preferred and recommend for inference time optimisations?

EricLiclair avatar Aug 01 '24 20:08 EricLiclair

Yes, you are correct, this is intentionally designed to align with GPTQ's logic, allowing us to leverage their CUDA kernel. Besides, the fixed zero point (zp) can be converted to zp=0 to utilize the Marlin kernel, which requires zp=0.

wenhuach21 avatar Aug 02 '24 01:08 wenhuach21

feel free to reopen this if you have additional questions

wenhuach21 avatar Sep 11 '24 03:09 wenhuach21