torchtune icon indicating copy to clipboard operation
torchtune copied to clipboard

Llama3.1 models do not allow configuring `max_seq_len`

Open akashc1 opened this issue 1 year ago • 6 comments

Llama 3.1 model builders hardcode the max context length, even though the component builders allow specifying it:

And since the QLoRA versions also use these, it affects that too. This prevents anyone from specifying the model's max_seq_len from a config or CLI. E.g. this config will throw an error:

output_dir: /tmp/torchtune/llama3_1_8B/lora # /tmp may be deleted by your system. Change it to your preference.
max_seq_len: 8192

# Tokenizer
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  path: /models/meta-llama/Llama-3.1-8B-Instruct/original/tokenizer.model
  max_seq_len: ${max_seq_len}

# Model Arguments
model:
  _component_: torchtune.models.llama3_1.lora_llama3_1_8b
  lora_attn_modules: ['q_proj', 'v_proj', 'output_proj']
  apply_lora_to_mlp: True
  apply_lora_to_output: False
  lora_rank: 8  # higher increases accuracy and memory
  lora_alpha: 16  # usually alpha=2*rank
  lora_dropout: 0.0
  max_seq_len: ${max_seq_len}
[rank4]: Traceback (most recent call last):
[rank4]:   File "/torchtune/recipes/lora_finetune_distributed.py", line 938, in <module>
[rank4]:     sys.exit(recipe_main())
[rank4]:   File "/torchtune/torchtune/config/_parse.py", line 99, in wrapper
[rank4]:     sys.exit(recipe_main(conf))
[rank4]:   File "/torchtune/recipes/lora_finetune_distributed.py", line 932, in recipe_main
[rank4]:     recipe.setup(cfg=cfg)
[rank4]:   File "/torchtune/recipes/lora_finetune_distributed.py", line 272, in setup
[rank4]:     self._model = self._setup_model(
[rank4]:   File "/torchtune/recipes/lora_finetune_distributed.py", line 453, in _setup_model
[rank4]:     model = config.instantiate(cfg_model)
[rank4]:   File "/torchtune/torchtune/config/_instantiate.py", line 112, in instantiate
[rank4]:     return _instantiate_node(OmegaConf.to_object(config), *args)
[rank4]:   File "/torchtune/torchtune/config/_instantiate.py", line 33, in _instantiate_node
[rank4]:     return _create_component(_component_, args, kwargs)
[rank4]:   File "/torchtune/torchtune/config/_instantiate.py", line 22, in _create_component
[rank4]:     return _component_(*args, **kwargs)
[rank4]: TypeError: lora_llama3_1_8b() got an unexpected keyword argument 'max_seq_len'

In my workload, and I'm sure for others as well, I need to specify the context length differently.

akashc1 avatar Dec 23 '24 20:12 akashc1

hey @akashc1, I see. We should fix it, but to be clear, you don't have to redefine max_seq_len for the model. This is used only for the positional embedding, you can leave it as 131k, unless you are trying to go beyond that.

For your specific case, where you just want to limit your sequence length in the batch, you should change it only in the tokenizer.

output_dir: /tmp/torchtune/llama3_1_8B/lora # /tmp may be deleted by your system. Change it to your preference.

# Tokenizer
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  path: /models/meta-llama/Llama-3.1-8B-Instruct/original/tokenizer.model
  max_seq_len: 8192

# Model Arguments
model:
  _component_: torchtune.models.llama3_1.lora_llama3_1_8b
  lora_attn_modules: ['q_proj', 'v_proj', 'output_proj']
  apply_lora_to_mlp: True
  apply_lora_to_output: False
  lora_rank: 8  # higher increases accuracy and memory
  lora_alpha: 16  # usually alpha=2*rank
  lora_dropout: 0.0
  # removed max_seq_len

felipemello1 avatar Dec 24 '24 15:12 felipemello1

@felipemello1 yes I understand that, however the transformer implementation does throw an error if it gets a seq_len longer than it was expecting from init. I've run into this when e.g. training Llama 3.1 405b LoRA with tokenizer.max_seq_len = 16384; if you don't specify in the model config as well it by default expects max_seq_len = 8192. Especially pertinent with data packing.

This is an example config that I've been using to ensure the tokenizer & model both produce/expect the same thing

max_seq_len: 16384

# Tokenizer
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  ...
  max_seq_len: ${max_seq_len}

# Model Arguments
model:
  _component_: torchtune.models.llama3_1.lora_llama3_1_405b
  ...
  max_seq_len: ${max_seq_len}

# Dataset and Sampler
dataset:
  _component_: torchtune.datasets.instruct_dataset
  ...
  packed: True

Let me know if this explains what I'm running into, or if I can clarify further :)

akashc1 avatar Dec 24 '24 22:12 akashc1

Oh, i see! Ok, there is a deeper problem that we set max_seq_len for 405B as 8k, which is wrong. The model was trained for >131k. Thanks for raising it.

I left a comment in your PR (https://github.com/pytorch/torchtune/pull/2203/files): Lets just update the docstring and I will approve it. By the way, should you try llama 3.3 70B? It has better or equivalent performance to 405B.

felipemello1 avatar Dec 24 '24 23:12 felipemello1

@ebsmothers , can you sanity check that its wrong that we set max_seq_len to 8k? https://github.com/pytorch/torchtune/blob/aa8f365f91a69aa36aaea14cf6f03ccd45310bb6/torchtune/models/llama3_1/_model_builders.py#L239C9-L239C26

felipemello1 avatar Dec 24 '24 23:12 felipemello1

@ebsmothers , can you sanity check that its wrong that we set max_seq_len to 8k? aa8f365/torchtune/models/llama3_1/_model_builders.py#L239C9-L239C26

@ebsmothers bump

joecummings avatar Jan 06 '25 14:01 joecummings

@felipemello1 @joecummings sorry I missed this until now. This looks like a bug to me, can we go ahead with #2203 as the fix? (With the caveat that we shouldn't hardcode the max_seq_len in the base llama3_1 builder as it is inconsistent with what we do in other places. See e.g. llama3 builder here)

ebsmothers avatar Jan 07 '25 01:01 ebsmothers

Closed by #2203

joecummings avatar Jan 14 '25 18:01 joecummings

hey @akashc1, I see. We should fix it, but to be clear, you don't have to redefine max_seq_len for the model. This is used only for the positional embedding, you can leave it as 131k, unless you are trying to go beyond that.

For your specific case, where you just want to limit your sequence length in the batch, you should change it only in the tokenizer.

output_dir: /tmp/torchtune/llama3_1_8B/lora # /tmp may be deleted by your system. Change it to your preference.

# Tokenizer
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  path: /models/meta-llama/Llama-3.1-8B-Instruct/original/tokenizer.model
  max_seq_len: 8192

# Model Arguments
model:
  _component_: torchtune.models.llama3_1.lora_llama3_1_8b
  lora_attn_modules: ['q_proj', 'v_proj', 'output_proj']
  apply_lora_to_mlp: True
  apply_lora_to_output: False
  lora_rank: 8  # higher increases accuracy and memory
  lora_alpha: 16  # usually alpha=2*rank
  lora_dropout: 0.0
  # removed max_seq_len

what if i want to set a pretrained model path , because i do not want to use /tmp directory

chuangzhidan avatar Jan 20 '25 10:01 chuangzhidan