Saleh Ashkboos
Saleh Ashkboos
Hi, Thanks for opening this issue. Actually, you cannot load GPTQ weights into the QuaRot as the approaches are a bit different. In QuaRot, we fuse some randomized Hadamard matrices...
Thanks @xinghaow99 I think the whole point is about scaling and you can try to scale the distribution using a single number (maybe divide them with the maximum value).
@xinghaow99 Yes. We only care about the dynamic range during quantization.
Thanks @Gloria2tt for your issue. I'm not sure that I got your problem. Can you please share your code/config to re-produce the issue? Thanks
Thanks @Niko-zyf for your issue. I am not sure if I got your issue right. I remember that we did a rotation in FP32 and this did not change the...
@ggerganov Thanks for your interest in our work. I am the main author of QuaRot. I would be happy to discuss/plan for this and help to integrate it into the...
Thanks @kimjoohyungsd for your issue. For the head dimension of 128, We fuse the Hadamard of size 128 to the output of the `W_v`. So in the above, the `softmax(QKT)V`...