Quantization method is different from paper?
Hello, thanks for your excellent work, and this project is beneficial to me.
However, I found that the quantization method in this project is somewhat different from that mentioned in the paper.
In the paper, the quantization function is:
Weights are quantized using:
which I think may be a mistake in the paper.
Activations are quantized using:
In this project, weights are quantized using:
Activations are quantized using: which I think may be a mistake.
In fact, according to quantize_module_.py, weights and activations are quantized using gemm method(an asymmetric uniform quantization method)
Is there any comparison between the two methods?
Best wishes.
There is a small mistake in weights quantization in the paper. His weights quantization implementation is correct.