QuaRot issues

Can we directly load a QuaRot-GPTQ quantized model and do lm_eval evaluation?

1

Shuai-Xie

Fix typo

kv_indicies -> kv_indices

eltociear

Question about reproducing Fig.1

1

Hi! Thanks for the great work! Been playing with the code today and trying to reproduce Figure 1 in the paper, and here's what I got. ![image](https://github.com/spcl/QuaRot/assets/50691954/f254ed8f-9173-419b-a60a-5abee3d68a9f) I noticed that...

xinghaow99

A question regarding the rotation matching pairs

SpinQuant is a subsequent work to QuaRot. However, we have noticed that the definitions of the rotation matrix pairing details differ between the two papers. In QuaRot, first, there is...

Menace-Dragon

Accuracy drop after `fuse_layer_norms`

1

I encountered an unexpected precision loss while using Quarot. I conducted comparison experiments on LLaMA-2-7b: Performing w4a16 RTN quantization on the model resulted in a PPL (Perplexity) of 7.354664. Performing...

Niko-zyf

Fix LayerNorm fusion for tied embeddings

Same idea as [here](https://github.com/microsoft/TransformerCompression/blob/6b12cdee6ad51791d7c776b3a046bc408b9e77e9/src/slicegpt/layernorm_fusion.py#L83-L85). opt-125m is impacted by this.

smpanaro

How is perplexity calculated with the KV cache?

I've noticed QuaRot and other KV cache papers include perplexity, but it is unclear to me how a quantized KV cache is used during perplexity calculation. Do you have a...

tsengalb99

questions about the rotate

1

![image](https://github.com/user-attachments/assets/8a6bc6db-8fd5-4e02-b4a5-4cc6c8c28f1a) Thanks for the wonderful work, however, i have some problem with the code. I've encountered a problem with the code implementation as described in the Introduction of your paper....

Gloria2tt

Precision Loss in bf16 Model with float32 Rotation Calculation

2

Description: I am experiencing a significant precision drop when using the quarot algorithm on a device limited to float32 calculations. Originally designed for double precision, the rotations are cast to...

Niko-zyf

Adding support for Llama 3.1 and Llama 3.2 models

1

The main modifications to support Llama 3.1 and 3.2: - In case of Llama 3.2 , `tie_word_embedding=True` so we need to do only one the rotation on input embedding as...

CryVeck

QuaRot
QuaRot copied to clipboard

Metadata

Can we directly load a QuaRot-GPTQ quantized model and do lm_eval evaluation?

Fix typo

Question about reproducing Fig.1

A question regarding the rotation matching pairs

Accuracy drop after `fuse_layer_norms`

Fix LayerNorm fusion for tied embeddings

How is perplexity calculated with the KV cache?

questions about the rotate

Precision Loss in bf16 Model with float32 Rotation Calculation

Adding support for Llama 3.1 and Llama 3.2 models

← Metadata

Owner

Metadata

QuaRot QuaRot copied to clipboard

Metadata

← Metadata

Owner

Metadata

QuaRot
QuaRot copied to clipboard