QuaRot
QuaRot copied to clipboard
A question regarding the rotation matching pairs
SpinQuant is a subsequent work to QuaRot. However, we have noticed that the definitions of the rotation matrix pairing details differ between the two papers. In QuaRot, first, there is an online Hadamard operation (with a dimension of head_dim) before o_proj. Secondly, o_weight is fused with a Hadamard matrix ( H ) of the entire tensor dimension. These highlighted in the red box in the figure below.
In SpinQuant (at below figure), the online Hadamard operation before o_proj is removed. Additionally, o_weight is fused with a Hadamard matrix ( H, now i.e., ) of head_dim dimension.
Why is this the case? Are the rotation matrix pairings in the two papers equivalent?