QuaRot icon indicating copy to clipboard operation
QuaRot copied to clipboard

Question about online Hadamard Heads in Figure 6

Open kimjoohyungsd opened this issue 3 months ago • 1 comments

Hi, I'm curious about efficiency in applying online Hadamard operations on Attention outputs as notated 'hadamard heads' in Figure 6. I believe the purpose of this operation is to match rotation size with quantization size as it is most accurate most case. However, I find online operation you mentioned doesn't seem to increase model accuracy compared to no online operations by applying same Hadamard matrix on both Wv and Wout. I mean Wout seems to be a quantization friendly distribution but what was the reason behind applying online hadamard operation after softmax(QKT)V operation?

Perplexity W8A8 W4A8 W4A4
QuaRot 5.481 6.701 8.097
Single Block 5.475 6.757 7.953

May I ask whether you found accuracy improvements by applying online operations?

kimjoohyungsd avatar Oct 29 '25 02:10 kimjoohyungsd

Thanks @kimjoohyungsd for your issue.

For the head dimension of 128, We fuse the Hadamard of size 128 to the output of the W_v. So in the above, the softmax(QKT)V is hadamarded, but with the size of 128. We apply another Hadamard transform between different heads to match the full Hadamard. This is important when you want to do a token-wise quantization. Let me explain it with an example.

Consider a model dimension 256 and the head dimension 128. the first 128 element of softmax(QKT)V are transformed with Hadamrd and the rest are also transformed with another (same) hadamrd matrix of size 128. Now, as these two parts have different scales, you may have some issue when you quantize all of them together. This is why we apply a Hadamard between these two parts to combine them.

I remember we say a non-tirivial change if we didn't apply those online Hadamard transformation. However, if you use group-wise quantization (with groupsize <=128), you don't need those online Hadamard at all because you never combine the values and calculate the scales for each part separately.

Please let me know if you have any other question.

sashkboos avatar Oct 29 '25 09:10 sashkboos