QuaRot
QuaRot copied to clipboard
Adding support for Llama 3.1 and Llama 3.2 models
The main modifications to support Llama 3.1 and 3.2:
- In case of Llama 3.2 ,
tie_word_embedding=Trueso we need to do only one the rotation on input embedding as they are the same data has the output ones. - In case of Llama 3.2, as
config.num_key_value_headsis different fromconfig.num_attention_heads, we need to give the full formula :config.hidden_size * config.num_key_value_heads / config.num_attention_headsto get the right dimension. - Adding one Hadamard matrix
This is very helpful for my current work comparing QuaRot on Llama-3.2. Thank you very much! Would appreciate if the author can review and merge if applicable. @sashkboos