auto-round [query] is the int symmetric quantisation only for unsigned int?

based on the implementations here - https://github.com/intel/auto-round/blob/9718d20333e4448b7cda96074ef481668d19c861/auto_round/data_type/int.py#L70

and my experiments at - https://colab.research.google.com/drive/1rjfaNl8B_9sQMMYXupk4DWtxwhR6LNpx?usp=sharing

it seems that the symmetric int quantization is intentionally only for unsigned int, am i correct on this? based on my understanding, the expected zero-point for symmetric int quantization is "0"; which in the current case is not so. currently, zp=8 if bits=4, and zp=128 if bits=8, etc. which reflects a case where the target data type could be unsigned int.

i wanted to know what int types (signed or unsigned) are generally preferred and recommend for inference time optimisations?

Aug 01 '24 20:08 EricLiclair

Yes, you are correct, this is intentionally designed to align with GPTQ's logic, allowing us to leverage their CUDA kernel. Besides, the fixed zero point (zp) can be converted to zp=0 to utilize the Marlin kernel, which requires zp=0.

Aug 02 '24 01:08 wenhuach21

feel free to reopen this if you have additional questions

Sep 11 '24 03:09 wenhuach21