KBLaM icon indicating copy to clipboard operation
KBLaM copied to clipboard

Error in __init_rope of KblamLlamaAttention

Open ThomasHoppe opened this issue 10 months ago • 2 comments

It seems that the configuration information in 'meta-llama/Llama-3.2-1B-Instruct/resolve/main/config.json' has changed since the code was used the last time.

Running the training on the enron dataset gives:

File "/home/fokus/Thomas/KBLaM/src/kblam/models/llama3_model.py", line 118, in __init__ self._init_rope() ~~~~~~~~~~~~~~~^^ File "/home/fokus/Thomas/KBLaM/src/kblam/models/llama3_model.py", line 128, in _init_rope scaling_type = self.config.rope_scaling["type"] ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^ KeyError: 'type' Printing out self.config.rope_scaling gives:

{'factor': 32.0, 'high_freq_factor': 4.0, 'low_freq_factor': 1.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'} I assume that the rope_type is fetched from https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct/resolve/main/config.json.

Changing self.config.rope_scaling["type"] to self.config.rope_scaling["rope_type"] gives now

File "/home/fokus/Thomas/KBLaM/src/kblam/models/llama3_model.py", line 146, in _init_rope raise ValueError(f"Unknown RoPE scaling type {scaling_type}") ValueError: Unknown RoPE scaling type llama3 since only the values 'linear' or 'dynamic' are allowed in _init_rope()

ThomasHoppe avatar Apr 03 '25 08:04 ThomasHoppe

I believe this is solved by sticking with huggingface 4.46.0 in #40 . Could you let me know if that's fixed things?

ti250 avatar Apr 14 '25 12:04 ti250

I believe this is solved by sticking with huggingface 4.46.0 in #40 . Could you let me know if that's fixed things?

Well, I am not sure. I think I experimenting with training of enron on llama model first, unaware, that the key-value-embeddings should be generated first. So I think it is not related to the transformer==4.46.0 issue. Anyway, I think the problem with the implemented dictionary key "type" and the concrete available dict key "rope_type" still remains ...

I think it is only invisible, if the key-value-embeddings are generated first.

ThomasHoppe avatar Apr 25 '25 11:04 ThomasHoppe