Adding support for Llama 3.1 and Llama 3.2 models

Open CryVeck opened this issue 1 year ago • 1 comments

The main modifications to support Llama 3.1 and 3.2:

In case of Llama 3.2 , tie_word_embedding=True so we need to do only one the rotation on input embedding as they are the same data has the output ones.
In case of Llama 3.2, as config.num_key_value_heads is different from config.num_attention_heads, we need to give the full formula : config.hidden_size * config.num_key_value_heads / config.num_attention_heads to get the right dimension.
Adding one Hadamard matrix

Dec 18 '24 05:12 CryVeck

This is very helpful for my current work comparing QuaRot on Llama-3.2. Thank you very much! Would appreciate if the author can review and merge if applicable. @sashkboos

Jul 09 '25 00:07 yc2367