[query] is the int symmetric quantisation only for unsigned int?
based on the implementations here - https://github.com/intel/auto-round/blob/9718d20333e4448b7cda96074ef481668d19c861/auto_round/data_type/int.py#L70
and my experiments at - https://colab.research.google.com/drive/1rjfaNl8B_9sQMMYXupk4DWtxwhR6LNpx?usp=sharing
it seems that the symmetric int quantization is intentionally only for unsigned int, am i correct on this? based on my understanding, the expected zero-point for symmetric int quantization is "0"; which in the current case is not so. currently, zp=8 if bits=4, and zp=128 if bits=8, etc. which reflects a case where the target data type could be unsigned int.
i wanted to know what int types (signed or unsigned) are generally preferred and recommend for inference time optimisations?
Yes, you are correct, this is intentionally designed to align with GPTQ's logic, allowing us to leverage their CUDA kernel. Besides, the fixed zero point (zp) can be converted to zp=0 to utilize the Marlin kernel, which requires zp=0.
feel free to reopen this if you have additional questions