peft icon indicating copy to clipboard operation
peft copied to clipboard

XLoRA: training issues, Gradients will be None

Open benjamin-marie opened this issue 1 year ago • 9 comments

I installed PEFT from source. And use the latest versions of Transformers and TRL. I passed the XLoRA model to TRL but the training doesn't seem to work (training loss doesn't decrease and validation loss remains constant). I get this warning: UserWarning: None of the inputs have requires_grad=True. Gradients will be None

I load Llama 3.1 (without quantization) and then run this code:

adapters = dict()
adapters["0"] = './adapter1/'
adapters["1"] = './adapter2/'

peft_config = XLoraConfig(
  task_type=TaskType.CAUSAL_LM,
  peft_type=PeftType.XLORA,
  hidden_size=model.config.hidden_size,
  xlora_depth=8,
  adapters=adapters,
  xlora_size=2048,
  layerwise_scalings=True,
  xlora_dropout_p=0.2
)

xlora_model = get_peft_model(model, peft_config)

training_arguments = SFTConfig(
        output_dir="./output/",
        optim="paged_adamw_8bit",
        per_device_train_batch_size=2,
        gradient_accumulation_steps=16,
        save_strategy="epoch",
        log_level="debug",
        logging_steps=1,
        learning_rate=1e-5,
        bf16 = True,
        num_train_epochs=1,
        warmup_ratio=0.1,
        lr_scheduler_type="linear",
        dataset_text_field="text",
        max_seq_length=512,
)

trainer = SFTTrainer(
        model=xlora_model,
        train_dataset=ds,
        tokenizer=tokenizer,
        args=training_arguments,
)

trainer.train()

I also observed another bug: The adapters must be named "0", "1", etc in the adapters dict() otherwise training won't start and will say that the adapters don't exist.

Maybe @EricLBuehler can help with this?

benjamin-marie avatar Aug 18 '24 10:08 benjamin-marie

This sounds like the X-LoRA classifier layers don't have requires_grad=True. Could you please print all parameter names with requires_grad=True on your model? What is your base model?

We're still working on a training example for X-LoRA, so it's possible that there are still some kinks that need to be ironed out.

BenjaminBossan avatar Aug 19 '24 10:08 BenjaminBossan

@benjamin-marie thanks for the example. I'll take a look this.

I also observed another bug: The adapters must be named "0", "1", etc in the adapters dict() otherwise training won't start and will say that the adapters don't exist.

Hmm ok, thanks for reporting this, I'll see what could be causing it.

EricLBuehler avatar Aug 19 '24 10:08 EricLBuehler

Here is my model (Llama 3.1 8B):

PeftModelForCausalLM(
  (base_model): XLoraModel(
    (lora_model): LoraModel(
      (model): LlamaForCausalLM(
        (model): LlamaModel(
          (embed_tokens): Embedding(128256, 4096)
          (layers): ModuleList(
            (0-31): 32 x LlamaDecoderLayer(
              (self_attn): LlamaFlashAttention2(
                (q_proj): lora.Linear(
                  (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
                  (lora_dropout): ModuleDict(
                    (0): Dropout(p=0.05, inplace=False)
                    (1): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (0): Linear(in_features=4096, out_features=16, bias=False)
                    (1): Linear(in_features=4096, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (0): Linear(in_features=16, out_features=4096, bias=False)
                    (1): Linear(in_features=16, out_features=4096, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                  (lora_magnitude_vector): ModuleDict()
                )
                (k_proj): lora.Linear(
                  (base_layer): Linear(in_features=4096, out_features=1024, bias=False)
                  (lora_dropout): ModuleDict(
                    (0): Dropout(p=0.05, inplace=False)
                    (1): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (0): Linear(in_features=4096, out_features=16, bias=False)
                    (1): Linear(in_features=4096, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (0): Linear(in_features=16, out_features=1024, bias=False)
                    (1): Linear(in_features=16, out_features=1024, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                  (lora_magnitude_vector): ModuleDict()
                )
                (v_proj): lora.Linear(
                  (base_layer): Linear(in_features=4096, out_features=1024, bias=False)
                  (lora_dropout): ModuleDict(
                    (0): Dropout(p=0.05, inplace=False)
                    (1): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (0): Linear(in_features=4096, out_features=16, bias=False)
                    (1): Linear(in_features=4096, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (0): Linear(in_features=16, out_features=1024, bias=False)
                    (1): Linear(in_features=16, out_features=1024, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                  (lora_magnitude_vector): ModuleDict()
                )
                (o_proj): lora.Linear(
                  (base_layer): Linear(in_features=4096, out_features=4096, bias=False)
                  (lora_dropout): ModuleDict(
                    (0): Dropout(p=0.05, inplace=False)
                    (1): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (0): Linear(in_features=4096, out_features=16, bias=False)
                    (1): Linear(in_features=4096, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (0): Linear(in_features=16, out_features=4096, bias=False)
                    (1): Linear(in_features=16, out_features=4096, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                  (lora_magnitude_vector): ModuleDict()
                )
                (rotary_emb): LlamaRotaryEmbedding()
              )
              (mlp): LlamaMLP(
                (gate_proj): lora.Linear(
                  (base_layer): Linear(in_features=4096, out_features=14336, bias=False)
                  (lora_dropout): ModuleDict(
                    (0): Dropout(p=0.05, inplace=False)
                    (1): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (0): Linear(in_features=4096, out_features=16, bias=False)
                    (1): Linear(in_features=4096, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (0): Linear(in_features=16, out_features=14336, bias=False)
                    (1): Linear(in_features=16, out_features=14336, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                  (lora_magnitude_vector): ModuleDict()
                )
                (up_proj): lora.Linear(
                  (base_layer): Linear(in_features=4096, out_features=14336, bias=False)
                  (lora_dropout): ModuleDict(
                    (0): Dropout(p=0.05, inplace=False)
                    (1): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (0): Linear(in_features=4096, out_features=16, bias=False)
                    (1): Linear(in_features=4096, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (0): Linear(in_features=16, out_features=14336, bias=False)
                    (1): Linear(in_features=16, out_features=14336, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                  (lora_magnitude_vector): ModuleDict()
                )
                (down_proj): lora.Linear(
                  (base_layer): Linear(in_features=14336, out_features=4096, bias=False)
                  (lora_dropout): ModuleDict(
                    (0): Dropout(p=0.05, inplace=False)
                    (1): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (0): Linear(in_features=14336, out_features=16, bias=False)
                    (1): Linear(in_features=14336, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (0): Linear(in_features=16, out_features=4096, bias=False)
                    (1): Linear(in_features=16, out_features=4096, bias=False)
                  )
                  (lora_embedding_A): ParameterDict()
                  (lora_embedding_B): ParameterDict()
                  (lora_magnitude_vector): ModuleDict()
                )
                (act_fn): SiLU()
              )
              (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
              (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
            )
          )
          (norm): LlamaRMSNorm((4096,), eps=1e-05)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (lm_head): Linear(in_features=4096, out_features=128256, bias=False)
      )
    )
    (internal_xlora_classifier): XLoraClassifier(
      (softmax): TemperatureScaledSoftmax(
        (softmax): Softmax(dim=-1)
      )
      (layers): Sequential(
        (0): Linear(in_features=4096, out_features=2048, bias=True)
        (1): ReLU()
        (2): Dropout(p=0.2, inplace=False)
        (3): Linear(in_features=2048, out_features=2048, bias=True)
        (4): ReLU()
        (5): Dropout(p=0.2, inplace=False)
        (6): Linear(in_features=2048, out_features=2048, bias=True)
        (7): ReLU()
        (8): Dropout(p=0.2, inplace=False)
        (9): Linear(in_features=2048, out_features=2048, bias=True)
        (10): ReLU()
        (11): Dropout(p=0.2, inplace=False)
        (12): Linear(in_features=2048, out_features=2048, bias=True)
        (13): ReLU()
        (14): Dropout(p=0.2, inplace=False)
        (15): Linear(in_features=2048, out_features=2048, bias=True)
        (16): ReLU()
        (17): Dropout(p=0.2, inplace=False)
        (18): Linear(in_features=2048, out_features=2048, bias=True)
        (19): ReLU()
        (20): Dropout(p=0.2, inplace=False)
        (21): Linear(in_features=2048, out_features=448, bias=True)
      )
    )
  )
)

Could you please print all parameter names with requires_grad=True on your model?

Sure, how do you do this? None of the params seems to have a "requires_grad" but I'm not sure whether I did it right.

benjamin-marie avatar Aug 19 '24 15:08 benjamin-marie

how do you do this

First of all, you can run model.print_trainable_parameters() for a global overview. Then something like this should do:

for name, param in model.named_parameters():
    if param.requires_grad:
        print(name)

BenjaminBossan avatar Aug 19 '24 15:08 BenjaminBossan

I added this code:

print(xlora_model.print_trainable_parameters())
print("--- Require grad? ----")
for name, param in model.named_parameters():
    if param.requires_grad:
        print(name)
print("----------------------")

It prints:

trainable params: 118,372,800 || all params: 8,148,634,048 || trainable%: 1.4527
None
--- Require grad? ----
model.layers.0.self_attn.q_proj.lora_A.0.weight
model.layers.0.self_attn.q_proj.lora_A.1.weight
model.layers.0.self_attn.q_proj.lora_B.0.weight
model.layers.0.self_attn.q_proj.lora_B.1.weight
model.layers.0.self_attn.k_proj.lora_A.0.weight
model.layers.0.self_attn.k_proj.lora_A.1.weight
model.layers.0.self_attn.k_proj.lora_B.0.weight
model.layers.0.self_attn.k_proj.lora_B.1.weight
model.layers.0.self_attn.v_proj.lora_A.0.weight
model.layers.0.self_attn.v_proj.lora_A.1.weight
model.layers.0.self_attn.v_proj.lora_B.0.weight
model.layers.0.self_attn.v_proj.lora_B.1.weight
model.layers.0.self_attn.o_proj.lora_A.0.weight
model.layers.0.self_attn.o_proj.lora_A.1.weight
model.layers.0.self_attn.o_proj.lora_B.0.weight
model.layers.0.self_attn.o_proj.lora_B.1.weight
model.layers.0.mlp.gate_proj.lora_A.0.weight
model.layers.0.mlp.gate_proj.lora_A.1.weight
model.layers.0.mlp.gate_proj.lora_B.0.weight
model.layers.0.mlp.gate_proj.lora_B.1.weight
model.layers.0.mlp.up_proj.lora_A.0.weight
model.layers.0.mlp.up_proj.lora_A.1.weight
model.layers.0.mlp.up_proj.lora_B.0.weight
model.layers.0.mlp.up_proj.lora_B.1.weight
model.layers.0.mlp.down_proj.lora_A.0.weight
model.layers.0.mlp.down_proj.lora_A.1.weight
model.layers.0.mlp.down_proj.lora_B.0.weight
model.layers.0.mlp.down_proj.lora_B.1.weight
model.layers.1.self_attn.q_proj.lora_A.0.weight
model.layers.1.self_attn.q_proj.lora_A.1.weight
model.layers.1.self_attn.q_proj.lora_B.0.weight
model.layers.1.self_attn.q_proj.lora_B.1.weight
model.layers.1.self_attn.k_proj.lora_A.0.weight
model.layers.1.self_attn.k_proj.lora_A.1.weight
model.layers.1.self_attn.k_proj.lora_B.0.weight
model.layers.1.self_attn.k_proj.lora_B.1.weight
model.layers.1.self_attn.v_proj.lora_A.0.weight
model.layers.1.self_attn.v_proj.lora_A.1.weight
model.layers.1.self_attn.v_proj.lora_B.0.weight
model.layers.1.self_attn.v_proj.lora_B.1.weight
model.layers.1.self_attn.o_proj.lora_A.0.weight
model.layers.1.self_attn.o_proj.lora_A.1.weight
model.layers.1.self_attn.o_proj.lora_B.0.weight
model.layers.1.self_attn.o_proj.lora_B.1.weight
model.layers.1.mlp.gate_proj.lora_A.0.weight
model.layers.1.mlp.gate_proj.lora_A.1.weight
model.layers.1.mlp.gate_proj.lora_B.0.weight
model.layers.1.mlp.gate_proj.lora_B.1.weight
model.layers.1.mlp.up_proj.lora_A.0.weight
model.layers.1.mlp.up_proj.lora_A.1.weight
model.layers.1.mlp.up_proj.lora_B.0.weight
model.layers.1.mlp.up_proj.lora_B.1.weight
model.layers.1.mlp.down_proj.lora_A.0.weight
model.layers.1.mlp.down_proj.lora_A.1.weight
model.layers.1.mlp.down_proj.lora_B.0.weight
model.layers.1.mlp.down_proj.lora_B.1.weight
model.layers.2.self_attn.q_proj.lora_A.0.weight
model.layers.2.self_attn.q_proj.lora_A.1.weight
model.layers.2.self_attn.q_proj.lora_B.0.weight
model.layers.2.self_attn.q_proj.lora_B.1.weight
model.layers.2.self_attn.k_proj.lora_A.0.weight
model.layers.2.self_attn.k_proj.lora_A.1.weight
model.layers.2.self_attn.k_proj.lora_B.0.weight
model.layers.2.self_attn.k_proj.lora_B.1.weight
model.layers.2.self_attn.v_proj.lora_A.0.weight
model.layers.2.self_attn.v_proj.lora_A.1.weight
model.layers.2.self_attn.v_proj.lora_B.0.weight
model.layers.2.self_attn.v_proj.lora_B.1.weight
model.layers.2.self_attn.o_proj.lora_A.0.weight
model.layers.2.self_attn.o_proj.lora_A.1.weight
model.layers.2.self_attn.o_proj.lora_B.0.weight
model.layers.2.self_attn.o_proj.lora_B.1.weight
model.layers.2.mlp.gate_proj.lora_A.0.weight
model.layers.2.mlp.gate_proj.lora_A.1.weight
model.layers.2.mlp.gate_proj.lora_B.0.weight
model.layers.2.mlp.gate_proj.lora_B.1.weight
model.layers.2.mlp.up_proj.lora_A.0.weight
model.layers.2.mlp.up_proj.lora_A.1.weight
model.layers.2.mlp.up_proj.lora_B.0.weight
model.layers.2.mlp.up_proj.lora_B.1.weight
model.layers.2.mlp.down_proj.lora_A.0.weight
model.layers.2.mlp.down_proj.lora_A.1.weight
model.layers.2.mlp.down_proj.lora_B.0.weight
model.layers.2.mlp.down_proj.lora_B.1.weight
model.layers.3.self_attn.q_proj.lora_A.0.weight
model.layers.3.self_attn.q_proj.lora_A.1.weight
model.layers.3.self_attn.q_proj.lora_B.0.weight
model.layers.3.self_attn.q_proj.lora_B.1.weight
model.layers.3.self_attn.k_proj.lora_A.0.weight
model.layers.3.self_attn.k_proj.lora_A.1.weight
model.layers.3.self_attn.k_proj.lora_B.0.weight
model.layers.3.self_attn.k_proj.lora_B.1.weight
model.layers.3.self_attn.v_proj.lora_A.0.weight
model.layers.3.self_attn.v_proj.lora_A.1.weight
model.layers.3.self_attn.v_proj.lora_B.0.weight
model.layers.3.self_attn.v_proj.lora_B.1.weight
model.layers.3.self_attn.o_proj.lora_A.0.weight
model.layers.3.self_attn.o_proj.lora_A.1.weight
model.layers.3.self_attn.o_proj.lora_B.0.weight
model.layers.3.self_attn.o_proj.lora_B.1.weight
model.layers.3.mlp.gate_proj.lora_A.0.weight
model.layers.3.mlp.gate_proj.lora_A.1.weight
model.layers.3.mlp.gate_proj.lora_B.0.weight
model.layers.3.mlp.gate_proj.lora_B.1.weight
model.layers.3.mlp.up_proj.lora_A.0.weight
model.layers.3.mlp.up_proj.lora_A.1.weight
model.layers.3.mlp.up_proj.lora_B.0.weight
model.layers.3.mlp.up_proj.lora_B.1.weight
model.layers.3.mlp.down_proj.lora_A.0.weight
model.layers.3.mlp.down_proj.lora_A.1.weight
model.layers.3.mlp.down_proj.lora_B.0.weight
model.layers.3.mlp.down_proj.lora_B.1.weight
model.layers.4.self_attn.q_proj.lora_A.0.weight
model.layers.4.self_attn.q_proj.lora_A.1.weight
model.layers.4.self_attn.q_proj.lora_B.0.weight
model.layers.4.self_attn.q_proj.lora_B.1.weight
model.layers.4.self_attn.k_proj.lora_A.0.weight
model.layers.4.self_attn.k_proj.lora_A.1.weight
model.layers.4.self_attn.k_proj.lora_B.0.weight
model.layers.4.self_attn.k_proj.lora_B.1.weight
model.layers.4.self_attn.v_proj.lora_A.0.weight
model.layers.4.self_attn.v_proj.lora_A.1.weight
model.layers.4.self_attn.v_proj.lora_B.0.weight
model.layers.4.self_attn.v_proj.lora_B.1.weight
model.layers.4.self_attn.o_proj.lora_A.0.weight
model.layers.4.self_attn.o_proj.lora_A.1.weight
model.layers.4.self_attn.o_proj.lora_B.0.weight
model.layers.4.self_attn.o_proj.lora_B.1.weight
model.layers.4.mlp.gate_proj.lora_A.0.weight
model.layers.4.mlp.gate_proj.lora_A.1.weight
model.layers.4.mlp.gate_proj.lora_B.0.weight
model.layers.4.mlp.gate_proj.lora_B.1.weight
model.layers.4.mlp.up_proj.lora_A.0.weight
model.layers.4.mlp.up_proj.lora_A.1.weight
model.layers.4.mlp.up_proj.lora_B.0.weight
model.layers.4.mlp.up_proj.lora_B.1.weight
model.layers.4.mlp.down_proj.lora_A.0.weight
model.layers.4.mlp.down_proj.lora_A.1.weight
model.layers.4.mlp.down_proj.lora_B.0.weight
model.layers.4.mlp.down_proj.lora_B.1.weight
model.layers.5.self_attn.q_proj.lora_A.0.weight
model.layers.5.self_attn.q_proj.lora_A.1.weight
model.layers.5.self_attn.q_proj.lora_B.0.weight
model.layers.5.self_attn.q_proj.lora_B.1.weight
model.layers.5.self_attn.k_proj.lora_A.0.weight
model.layers.5.self_attn.k_proj.lora_A.1.weight
model.layers.5.self_attn.k_proj.lora_B.0.weight
model.layers.5.self_attn.k_proj.lora_B.1.weight
model.layers.5.self_attn.v_proj.lora_A.0.weight
model.layers.5.self_attn.v_proj.lora_A.1.weight
model.layers.5.self_attn.v_proj.lora_B.0.weight
model.layers.5.self_attn.v_proj.lora_B.1.weight
model.layers.5.self_attn.o_proj.lora_A.0.weight
model.layers.5.self_attn.o_proj.lora_A.1.weight
model.layers.5.self_attn.o_proj.lora_B.0.weight
model.layers.5.self_attn.o_proj.lora_B.1.weight
model.layers.5.mlp.gate_proj.lora_A.0.weight
model.layers.5.mlp.gate_proj.lora_A.1.weight
model.layers.5.mlp.gate_proj.lora_B.0.weight
model.layers.5.mlp.gate_proj.lora_B.1.weight
model.layers.5.mlp.up_proj.lora_A.0.weight
model.layers.5.mlp.up_proj.lora_A.1.weight
model.layers.5.mlp.up_proj.lora_B.0.weight
model.layers.5.mlp.up_proj.lora_B.1.weight
model.layers.5.mlp.down_proj.lora_A.0.weight
model.layers.5.mlp.down_proj.lora_A.1.weight
model.layers.5.mlp.down_proj.lora_B.0.weight
model.layers.5.mlp.down_proj.lora_B.1.weight
model.layers.6.self_attn.q_proj.lora_A.0.weight
model.layers.6.self_attn.q_proj.lora_A.1.weight
model.layers.6.self_attn.q_proj.lora_B.0.weight
model.layers.6.self_attn.q_proj.lora_B.1.weight
model.layers.6.self_attn.k_proj.lora_A.0.weight
model.layers.6.self_attn.k_proj.lora_A.1.weight
model.layers.6.self_attn.k_proj.lora_B.0.weight
model.layers.6.self_attn.k_proj.lora_B.1.weight
model.layers.6.self_attn.v_proj.lora_A.0.weight
model.layers.6.self_attn.v_proj.lora_A.1.weight
model.layers.6.self_attn.v_proj.lora_B.0.weight
model.layers.6.self_attn.v_proj.lora_B.1.weight
model.layers.6.self_attn.o_proj.lora_A.0.weight
model.layers.6.self_attn.o_proj.lora_A.1.weight
model.layers.6.self_attn.o_proj.lora_B.0.weight
model.layers.6.self_attn.o_proj.lora_B.1.weight
model.layers.6.mlp.gate_proj.lora_A.0.weight
model.layers.6.mlp.gate_proj.lora_A.1.weight
model.layers.6.mlp.gate_proj.lora_B.0.weight
model.layers.6.mlp.gate_proj.lora_B.1.weight
model.layers.6.mlp.up_proj.lora_A.0.weight
model.layers.6.mlp.up_proj.lora_A.1.weight
model.layers.6.mlp.up_proj.lora_B.0.weight
model.layers.6.mlp.up_proj.lora_B.1.weight
model.layers.6.mlp.down_proj.lora_A.0.weight
model.layers.6.mlp.down_proj.lora_A.1.weight
model.layers.6.mlp.down_proj.lora_B.0.weight
model.layers.6.mlp.down_proj.lora_B.1.weight
model.layers.7.self_attn.q_proj.lora_A.0.weight
model.layers.7.self_attn.q_proj.lora_A.1.weight
model.layers.7.self_attn.q_proj.lora_B.0.weight
model.layers.7.self_attn.q_proj.lora_B.1.weight
model.layers.7.self_attn.k_proj.lora_A.0.weight
model.layers.7.self_attn.k_proj.lora_A.1.weight
model.layers.7.self_attn.k_proj.lora_B.0.weight
model.layers.7.self_attn.k_proj.lora_B.1.weight
model.layers.7.self_attn.v_proj.lora_A.0.weight
model.layers.7.self_attn.v_proj.lora_A.1.weight
model.layers.7.self_attn.v_proj.lora_B.0.weight
model.layers.7.self_attn.v_proj.lora_B.1.weight
model.layers.7.self_attn.o_proj.lora_A.0.weight
model.layers.7.self_attn.o_proj.lora_A.1.weight
model.layers.7.self_attn.o_proj.lora_B.0.weight
model.layers.7.self_attn.o_proj.lora_B.1.weight
model.layers.7.mlp.gate_proj.lora_A.0.weight
model.layers.7.mlp.gate_proj.lora_A.1.weight
model.layers.7.mlp.gate_proj.lora_B.0.weight
model.layers.7.mlp.gate_proj.lora_B.1.weight
model.layers.7.mlp.up_proj.lora_A.0.weight
model.layers.7.mlp.up_proj.lora_A.1.weight
model.layers.7.mlp.up_proj.lora_B.0.weight
model.layers.7.mlp.up_proj.lora_B.1.weight
model.layers.7.mlp.down_proj.lora_A.0.weight
model.layers.7.mlp.down_proj.lora_A.1.weight
model.layers.7.mlp.down_proj.lora_B.0.weight
model.layers.7.mlp.down_proj.lora_B.1.weight
model.layers.8.self_attn.q_proj.lora_A.0.weight
model.layers.8.self_attn.q_proj.lora_A.1.weight
model.layers.8.self_attn.q_proj.lora_B.0.weight
model.layers.8.self_attn.q_proj.lora_B.1.weight
model.layers.8.self_attn.k_proj.lora_A.0.weight
model.layers.8.self_attn.k_proj.lora_A.1.weight
model.layers.8.self_attn.k_proj.lora_B.0.weight
model.layers.8.self_attn.k_proj.lora_B.1.weight
model.layers.8.self_attn.v_proj.lora_A.0.weight
model.layers.8.self_attn.v_proj.lora_A.1.weight
model.layers.8.self_attn.v_proj.lora_B.0.weight
model.layers.8.self_attn.v_proj.lora_B.1.weight
model.layers.8.self_attn.o_proj.lora_A.0.weight
model.layers.8.self_attn.o_proj.lora_A.1.weight
model.layers.8.self_attn.o_proj.lora_B.0.weight
model.layers.8.self_attn.o_proj.lora_B.1.weight
model.layers.8.mlp.gate_proj.lora_A.0.weight
model.layers.8.mlp.gate_proj.lora_A.1.weight
model.layers.8.mlp.gate_proj.lora_B.0.weight
model.layers.8.mlp.gate_proj.lora_B.1.weight
model.layers.8.mlp.up_proj.lora_A.0.weight
model.layers.8.mlp.up_proj.lora_A.1.weight
model.layers.8.mlp.up_proj.lora_B.0.weight
model.layers.8.mlp.up_proj.lora_B.1.weight
model.layers.8.mlp.down_proj.lora_A.0.weight
model.layers.8.mlp.down_proj.lora_A.1.weight
model.layers.8.mlp.down_proj.lora_B.0.weight
model.layers.8.mlp.down_proj.lora_B.1.weight
model.layers.9.self_attn.q_proj.lora_A.0.weight
model.layers.9.self_attn.q_proj.lora_A.1.weight
model.layers.9.self_attn.q_proj.lora_B.0.weight
model.layers.9.self_attn.q_proj.lora_B.1.weight
model.layers.9.self_attn.k_proj.lora_A.0.weight
model.layers.9.self_attn.k_proj.lora_A.1.weight
model.layers.9.self_attn.k_proj.lora_B.0.weight
model.layers.9.self_attn.k_proj.lora_B.1.weight
model.layers.9.self_attn.v_proj.lora_A.0.weight
model.layers.9.self_attn.v_proj.lora_A.1.weight
model.layers.9.self_attn.v_proj.lora_B.0.weight
model.layers.9.self_attn.v_proj.lora_B.1.weight
model.layers.9.self_attn.o_proj.lora_A.0.weight
model.layers.9.self_attn.o_proj.lora_A.1.weight
model.layers.9.self_attn.o_proj.lora_B.0.weight
model.layers.9.self_attn.o_proj.lora_B.1.weight
model.layers.9.mlp.gate_proj.lora_A.0.weight
model.layers.9.mlp.gate_proj.lora_A.1.weight
model.layers.9.mlp.gate_proj.lora_B.0.weight
model.layers.9.mlp.gate_proj.lora_B.1.weight
model.layers.9.mlp.up_proj.lora_A.0.weight
model.layers.9.mlp.up_proj.lora_A.1.weight
model.layers.9.mlp.up_proj.lora_B.0.weight
model.layers.9.mlp.up_proj.lora_B.1.weight
model.layers.9.mlp.down_proj.lora_A.0.weight
model.layers.9.mlp.down_proj.lora_A.1.weight
model.layers.9.mlp.down_proj.lora_B.0.weight
model.layers.9.mlp.down_proj.lora_B.1.weight
model.layers.10.self_attn.q_proj.lora_A.0.weight
model.layers.10.self_attn.q_proj.lora_A.1.weight
model.layers.10.self_attn.q_proj.lora_B.0.weight
model.layers.10.self_attn.q_proj.lora_B.1.weight
model.layers.10.self_attn.k_proj.lora_A.0.weight
model.layers.10.self_attn.k_proj.lora_A.1.weight
model.layers.10.self_attn.k_proj.lora_B.0.weight
model.layers.10.self_attn.k_proj.lora_B.1.weight
model.layers.10.self_attn.v_proj.lora_A.0.weight
model.layers.10.self_attn.v_proj.lora_A.1.weight
model.layers.10.self_attn.v_proj.lora_B.0.weight
model.layers.10.self_attn.v_proj.lora_B.1.weight
model.layers.10.self_attn.o_proj.lora_A.0.weight
model.layers.10.self_attn.o_proj.lora_A.1.weight
model.layers.10.self_attn.o_proj.lora_B.0.weight
model.layers.10.self_attn.o_proj.lora_B.1.weight
model.layers.10.mlp.gate_proj.lora_A.0.weight
model.layers.10.mlp.gate_proj.lora_A.1.weight
model.layers.10.mlp.gate_proj.lora_B.0.weight
model.layers.10.mlp.gate_proj.lora_B.1.weight
model.layers.10.mlp.up_proj.lora_A.0.weight
model.layers.10.mlp.up_proj.lora_A.1.weight
model.layers.10.mlp.up_proj.lora_B.0.weight
model.layers.10.mlp.up_proj.lora_B.1.weight
model.layers.10.mlp.down_proj.lora_A.0.weight
model.layers.10.mlp.down_proj.lora_A.1.weight
model.layers.10.mlp.down_proj.lora_B.0.weight
model.layers.10.mlp.down_proj.lora_B.1.weight
model.layers.11.self_attn.q_proj.lora_A.0.weight
model.layers.11.self_attn.q_proj.lora_A.1.weight
model.layers.11.self_attn.q_proj.lora_B.0.weight
model.layers.11.self_attn.q_proj.lora_B.1.weight
model.layers.11.self_attn.k_proj.lora_A.0.weight
model.layers.11.self_attn.k_proj.lora_A.1.weight
model.layers.11.self_attn.k_proj.lora_B.0.weight
model.layers.11.self_attn.k_proj.lora_B.1.weight
model.layers.11.self_attn.v_proj.lora_A.0.weight
model.layers.11.self_attn.v_proj.lora_A.1.weight
model.layers.11.self_attn.v_proj.lora_B.0.weight
model.layers.11.self_attn.v_proj.lora_B.1.weight
model.layers.11.self_attn.o_proj.lora_A.0.weight
model.layers.11.self_attn.o_proj.lora_A.1.weight
model.layers.11.self_attn.o_proj.lora_B.0.weight
model.layers.11.self_attn.o_proj.lora_B.1.weight
model.layers.11.mlp.gate_proj.lora_A.0.weight
model.layers.11.mlp.gate_proj.lora_A.1.weight
model.layers.11.mlp.gate_proj.lora_B.0.weight
model.layers.11.mlp.gate_proj.lora_B.1.weight
model.layers.11.mlp.up_proj.lora_A.0.weight
model.layers.11.mlp.up_proj.lora_A.1.weight
model.layers.11.mlp.up_proj.lora_B.0.weight
model.layers.11.mlp.up_proj.lora_B.1.weight
model.layers.11.mlp.down_proj.lora_A.0.weight
model.layers.11.mlp.down_proj.lora_A.1.weight
model.layers.11.mlp.down_proj.lora_B.0.weight
model.layers.11.mlp.down_proj.lora_B.1.weight
model.layers.12.self_attn.q_proj.lora_A.0.weight
model.layers.12.self_attn.q_proj.lora_A.1.weight
model.layers.12.self_attn.q_proj.lora_B.0.weight
model.layers.12.self_attn.q_proj.lora_B.1.weight
model.layers.12.self_attn.k_proj.lora_A.0.weight
model.layers.12.self_attn.k_proj.lora_A.1.weight
model.layers.12.self_attn.k_proj.lora_B.0.weight
model.layers.12.self_attn.k_proj.lora_B.1.weight
model.layers.12.self_attn.v_proj.lora_A.0.weight
model.layers.12.self_attn.v_proj.lora_A.1.weight
model.layers.12.self_attn.v_proj.lora_B.0.weight
model.layers.12.self_attn.v_proj.lora_B.1.weight
model.layers.12.self_attn.o_proj.lora_A.0.weight
model.layers.12.self_attn.o_proj.lora_A.1.weight
model.layers.12.self_attn.o_proj.lora_B.0.weight
model.layers.12.self_attn.o_proj.lora_B.1.weight
model.layers.12.mlp.gate_proj.lora_A.0.weight
model.layers.12.mlp.gate_proj.lora_A.1.weight
model.layers.12.mlp.gate_proj.lora_B.0.weight
model.layers.12.mlp.gate_proj.lora_B.1.weight
model.layers.12.mlp.up_proj.lora_A.0.weight
model.layers.12.mlp.up_proj.lora_A.1.weight
model.layers.12.mlp.up_proj.lora_B.0.weight
model.layers.12.mlp.up_proj.lora_B.1.weight
model.layers.12.mlp.down_proj.lora_A.0.weight
model.layers.12.mlp.down_proj.lora_A.1.weight
model.layers.12.mlp.down_proj.lora_B.0.weight
model.layers.12.mlp.down_proj.lora_B.1.weight
model.layers.13.self_attn.q_proj.lora_A.0.weight
model.layers.13.self_attn.q_proj.lora_A.1.weight
model.layers.13.self_attn.q_proj.lora_B.0.weight
model.layers.13.self_attn.q_proj.lora_B.1.weight
model.layers.13.self_attn.k_proj.lora_A.0.weight
model.layers.13.self_attn.k_proj.lora_A.1.weight
model.layers.13.self_attn.k_proj.lora_B.0.weight
model.layers.13.self_attn.k_proj.lora_B.1.weight
model.layers.13.self_attn.v_proj.lora_A.0.weight
model.layers.13.self_attn.v_proj.lora_A.1.weight
model.layers.13.self_attn.v_proj.lora_B.0.weight
model.layers.13.self_attn.v_proj.lora_B.1.weight
model.layers.13.self_attn.o_proj.lora_A.0.weight
model.layers.13.self_attn.o_proj.lora_A.1.weight
model.layers.13.self_attn.o_proj.lora_B.0.weight
model.layers.13.self_attn.o_proj.lora_B.1.weight
model.layers.13.mlp.gate_proj.lora_A.0.weight
model.layers.13.mlp.gate_proj.lora_A.1.weight
model.layers.13.mlp.gate_proj.lora_B.0.weight
model.layers.13.mlp.gate_proj.lora_B.1.weight
model.layers.13.mlp.up_proj.lora_A.0.weight
model.layers.13.mlp.up_proj.lora_A.1.weight
model.layers.13.mlp.up_proj.lora_B.0.weight
model.layers.13.mlp.up_proj.lora_B.1.weight
model.layers.13.mlp.down_proj.lora_A.0.weight
model.layers.13.mlp.down_proj.lora_A.1.weight
model.layers.13.mlp.down_proj.lora_B.0.weight
model.layers.13.mlp.down_proj.lora_B.1.weight
model.layers.14.self_attn.q_proj.lora_A.0.weight
model.layers.14.self_attn.q_proj.lora_A.1.weight
model.layers.14.self_attn.q_proj.lora_B.0.weight
model.layers.14.self_attn.q_proj.lora_B.1.weight
model.layers.14.self_attn.k_proj.lora_A.0.weight
model.layers.14.self_attn.k_proj.lora_A.1.weight
model.layers.14.self_attn.k_proj.lora_B.0.weight
model.layers.14.self_attn.k_proj.lora_B.1.weight
model.layers.14.self_attn.v_proj.lora_A.0.weight
model.layers.14.self_attn.v_proj.lora_A.1.weight
model.layers.14.self_attn.v_proj.lora_B.0.weight
model.layers.14.self_attn.v_proj.lora_B.1.weight
model.layers.14.self_attn.o_proj.lora_A.0.weight
model.layers.14.self_attn.o_proj.lora_A.1.weight
model.layers.14.self_attn.o_proj.lora_B.0.weight
model.layers.14.self_attn.o_proj.lora_B.1.weight
model.layers.14.mlp.gate_proj.lora_A.0.weight
model.layers.14.mlp.gate_proj.lora_A.1.weight
model.layers.14.mlp.gate_proj.lora_B.0.weight
model.layers.14.mlp.gate_proj.lora_B.1.weight
model.layers.14.mlp.up_proj.lora_A.0.weight
model.layers.14.mlp.up_proj.lora_A.1.weight
model.layers.14.mlp.up_proj.lora_B.0.weight
model.layers.14.mlp.up_proj.lora_B.1.weight
model.layers.14.mlp.down_proj.lora_A.0.weight
model.layers.14.mlp.down_proj.lora_A.1.weight
model.layers.14.mlp.down_proj.lora_B.0.weight
model.layers.14.mlp.down_proj.lora_B.1.weight
model.layers.15.self_attn.q_proj.lora_A.0.weight
model.layers.15.self_attn.q_proj.lora_A.1.weight
model.layers.15.self_attn.q_proj.lora_B.0.weight
model.layers.15.self_attn.q_proj.lora_B.1.weight
model.layers.15.self_attn.k_proj.lora_A.0.weight
model.layers.15.self_attn.k_proj.lora_A.1.weight
model.layers.15.self_attn.k_proj.lora_B.0.weight
model.layers.15.self_attn.k_proj.lora_B.1.weight
model.layers.15.self_attn.v_proj.lora_A.0.weight
model.layers.15.self_attn.v_proj.lora_A.1.weight
model.layers.15.self_attn.v_proj.lora_B.0.weight
model.layers.15.self_attn.v_proj.lora_B.1.weight
model.layers.15.self_attn.o_proj.lora_A.0.weight
model.layers.15.self_attn.o_proj.lora_A.1.weight
model.layers.15.self_attn.o_proj.lora_B.0.weight
model.layers.15.self_attn.o_proj.lora_B.1.weight
model.layers.15.mlp.gate_proj.lora_A.0.weight
model.layers.15.mlp.gate_proj.lora_A.1.weight
model.layers.15.mlp.gate_proj.lora_B.0.weight
model.layers.15.mlp.gate_proj.lora_B.1.weight
model.layers.15.mlp.up_proj.lora_A.0.weight
model.layers.15.mlp.up_proj.lora_A.1.weight
model.layers.15.mlp.up_proj.lora_B.0.weight
model.layers.15.mlp.up_proj.lora_B.1.weight
model.layers.15.mlp.down_proj.lora_A.0.weight
model.layers.15.mlp.down_proj.lora_A.1.weight
model.layers.15.mlp.down_proj.lora_B.0.weight
model.layers.15.mlp.down_proj.lora_B.1.weight
model.layers.16.self_attn.q_proj.lora_A.0.weight
model.layers.16.self_attn.q_proj.lora_A.1.weight
model.layers.16.self_attn.q_proj.lora_B.0.weight
model.layers.16.self_attn.q_proj.lora_B.1.weight
model.layers.16.self_attn.k_proj.lora_A.0.weight
model.layers.16.self_attn.k_proj.lora_A.1.weight
model.layers.16.self_attn.k_proj.lora_B.0.weight
model.layers.16.self_attn.k_proj.lora_B.1.weight
model.layers.16.self_attn.v_proj.lora_A.0.weight
model.layers.16.self_attn.v_proj.lora_A.1.weight
model.layers.16.self_attn.v_proj.lora_B.0.weight
model.layers.16.self_attn.v_proj.lora_B.1.weight
model.layers.16.self_attn.o_proj.lora_A.0.weight
model.layers.16.self_attn.o_proj.lora_A.1.weight
model.layers.16.self_attn.o_proj.lora_B.0.weight
model.layers.16.self_attn.o_proj.lora_B.1.weight
model.layers.16.mlp.gate_proj.lora_A.0.weight
model.layers.16.mlp.gate_proj.lora_A.1.weight
model.layers.16.mlp.gate_proj.lora_B.0.weight
model.layers.16.mlp.gate_proj.lora_B.1.weight
model.layers.16.mlp.up_proj.lora_A.0.weight
model.layers.16.mlp.up_proj.lora_A.1.weight
model.layers.16.mlp.up_proj.lora_B.0.weight
model.layers.16.mlp.up_proj.lora_B.1.weight
model.layers.16.mlp.down_proj.lora_A.0.weight
model.layers.16.mlp.down_proj.lora_A.1.weight
model.layers.16.mlp.down_proj.lora_B.0.weight
model.layers.16.mlp.down_proj.lora_B.1.weight
model.layers.17.self_attn.q_proj.lora_A.0.weight
model.layers.17.self_attn.q_proj.lora_A.1.weight
model.layers.17.self_attn.q_proj.lora_B.0.weight
model.layers.17.self_attn.q_proj.lora_B.1.weight
model.layers.17.self_attn.k_proj.lora_A.0.weight
model.layers.17.self_attn.k_proj.lora_A.1.weight
model.layers.17.self_attn.k_proj.lora_B.0.weight
model.layers.17.self_attn.k_proj.lora_B.1.weight
model.layers.17.self_attn.v_proj.lora_A.0.weight
model.layers.17.self_attn.v_proj.lora_A.1.weight
model.layers.17.self_attn.v_proj.lora_B.0.weight
model.layers.17.self_attn.v_proj.lora_B.1.weight
model.layers.17.self_attn.o_proj.lora_A.0.weight
model.layers.17.self_attn.o_proj.lora_A.1.weight
model.layers.17.self_attn.o_proj.lora_B.0.weight
model.layers.17.self_attn.o_proj.lora_B.1.weight
model.layers.17.mlp.gate_proj.lora_A.0.weight
model.layers.17.mlp.gate_proj.lora_A.1.weight
model.layers.17.mlp.gate_proj.lora_B.0.weight
model.layers.17.mlp.gate_proj.lora_B.1.weight
model.layers.17.mlp.up_proj.lora_A.0.weight
model.layers.17.mlp.up_proj.lora_A.1.weight
model.layers.17.mlp.up_proj.lora_B.0.weight
model.layers.17.mlp.up_proj.lora_B.1.weight
model.layers.17.mlp.down_proj.lora_A.0.weight
model.layers.17.mlp.down_proj.lora_A.1.weight
model.layers.17.mlp.down_proj.lora_B.0.weight
model.layers.17.mlp.down_proj.lora_B.1.weight
model.layers.18.self_attn.q_proj.lora_A.0.weight
model.layers.18.self_attn.q_proj.lora_A.1.weight
model.layers.18.self_attn.q_proj.lora_B.0.weight
model.layers.18.self_attn.q_proj.lora_B.1.weight
model.layers.18.self_attn.k_proj.lora_A.0.weight
model.layers.18.self_attn.k_proj.lora_A.1.weight
model.layers.18.self_attn.k_proj.lora_B.0.weight
model.layers.18.self_attn.k_proj.lora_B.1.weight
model.layers.18.self_attn.v_proj.lora_A.0.weight
model.layers.18.self_attn.v_proj.lora_A.1.weight
model.layers.18.self_attn.v_proj.lora_B.0.weight
model.layers.18.self_attn.v_proj.lora_B.1.weight
model.layers.18.self_attn.o_proj.lora_A.0.weight
model.layers.18.self_attn.o_proj.lora_A.1.weight
model.layers.18.self_attn.o_proj.lora_B.0.weight
model.layers.18.self_attn.o_proj.lora_B.1.weight
model.layers.18.mlp.gate_proj.lora_A.0.weight
model.layers.18.mlp.gate_proj.lora_A.1.weight
model.layers.18.mlp.gate_proj.lora_B.0.weight
model.layers.18.mlp.gate_proj.lora_B.1.weight
model.layers.18.mlp.up_proj.lora_A.0.weight
model.layers.18.mlp.up_proj.lora_A.1.weight
model.layers.18.mlp.up_proj.lora_B.0.weight
model.layers.18.mlp.up_proj.lora_B.1.weight
model.layers.18.mlp.down_proj.lora_A.0.weight
model.layers.18.mlp.down_proj.lora_A.1.weight
model.layers.18.mlp.down_proj.lora_B.0.weight
model.layers.18.mlp.down_proj.lora_B.1.weight
model.layers.19.self_attn.q_proj.lora_A.0.weight
model.layers.19.self_attn.q_proj.lora_A.1.weight
model.layers.19.self_attn.q_proj.lora_B.0.weight
model.layers.19.self_attn.q_proj.lora_B.1.weight
model.layers.19.self_attn.k_proj.lora_A.0.weight
model.layers.19.self_attn.k_proj.lora_A.1.weight
model.layers.19.self_attn.k_proj.lora_B.0.weight
model.layers.19.self_attn.k_proj.lora_B.1.weight
model.layers.19.self_attn.v_proj.lora_A.0.weight
model.layers.19.self_attn.v_proj.lora_A.1.weight
model.layers.19.self_attn.v_proj.lora_B.0.weight
model.layers.19.self_attn.v_proj.lora_B.1.weight
model.layers.19.self_attn.o_proj.lora_A.0.weight
model.layers.19.self_attn.o_proj.lora_A.1.weight
model.layers.19.self_attn.o_proj.lora_B.0.weight
model.layers.19.self_attn.o_proj.lora_B.1.weight
model.layers.19.mlp.gate_proj.lora_A.0.weight
model.layers.19.mlp.gate_proj.lora_A.1.weight
model.layers.19.mlp.gate_proj.lora_B.0.weight
model.layers.19.mlp.gate_proj.lora_B.1.weight
model.layers.19.mlp.up_proj.lora_A.0.weight
model.layers.19.mlp.up_proj.lora_A.1.weight
model.layers.19.mlp.up_proj.lora_B.0.weight
model.layers.19.mlp.up_proj.lora_B.1.weight
model.layers.19.mlp.down_proj.lora_A.0.weight
model.layers.19.mlp.down_proj.lora_A.1.weight
model.layers.19.mlp.down_proj.lora_B.0.weight
model.layers.19.mlp.down_proj.lora_B.1.weight
model.layers.20.self_attn.q_proj.lora_A.0.weight
model.layers.20.self_attn.q_proj.lora_A.1.weight
model.layers.20.self_attn.q_proj.lora_B.0.weight
model.layers.20.self_attn.q_proj.lora_B.1.weight
model.layers.20.self_attn.k_proj.lora_A.0.weight
model.layers.20.self_attn.k_proj.lora_A.1.weight
model.layers.20.self_attn.k_proj.lora_B.0.weight
model.layers.20.self_attn.k_proj.lora_B.1.weight
model.layers.20.self_attn.v_proj.lora_A.0.weight
model.layers.20.self_attn.v_proj.lora_A.1.weight
model.layers.20.self_attn.v_proj.lora_B.0.weight
model.layers.20.self_attn.v_proj.lora_B.1.weight
model.layers.20.self_attn.o_proj.lora_A.0.weight
model.layers.20.self_attn.o_proj.lora_A.1.weight
model.layers.20.self_attn.o_proj.lora_B.0.weight
model.layers.20.self_attn.o_proj.lora_B.1.weight
model.layers.20.mlp.gate_proj.lora_A.0.weight
model.layers.20.mlp.gate_proj.lora_A.1.weight
model.layers.20.mlp.gate_proj.lora_B.0.weight
model.layers.20.mlp.gate_proj.lora_B.1.weight
model.layers.20.mlp.up_proj.lora_A.0.weight
model.layers.20.mlp.up_proj.lora_A.1.weight
model.layers.20.mlp.up_proj.lora_B.0.weight
model.layers.20.mlp.up_proj.lora_B.1.weight
model.layers.20.mlp.down_proj.lora_A.0.weight
model.layers.20.mlp.down_proj.lora_A.1.weight
model.layers.20.mlp.down_proj.lora_B.0.weight
model.layers.20.mlp.down_proj.lora_B.1.weight
model.layers.21.self_attn.q_proj.lora_A.0.weight
model.layers.21.self_attn.q_proj.lora_A.1.weight
model.layers.21.self_attn.q_proj.lora_B.0.weight
model.layers.21.self_attn.q_proj.lora_B.1.weight
model.layers.21.self_attn.k_proj.lora_A.0.weight
model.layers.21.self_attn.k_proj.lora_A.1.weight
model.layers.21.self_attn.k_proj.lora_B.0.weight
model.layers.21.self_attn.k_proj.lora_B.1.weight
model.layers.21.self_attn.v_proj.lora_A.0.weight
model.layers.21.self_attn.v_proj.lora_A.1.weight
model.layers.21.self_attn.v_proj.lora_B.0.weight
model.layers.21.self_attn.v_proj.lora_B.1.weight
model.layers.21.self_attn.o_proj.lora_A.0.weight
model.layers.21.self_attn.o_proj.lora_A.1.weight
model.layers.21.self_attn.o_proj.lora_B.0.weight
model.layers.21.self_attn.o_proj.lora_B.1.weight
model.layers.21.mlp.gate_proj.lora_A.0.weight
model.layers.21.mlp.gate_proj.lora_A.1.weight
model.layers.21.mlp.gate_proj.lora_B.0.weight
model.layers.21.mlp.gate_proj.lora_B.1.weight
model.layers.21.mlp.up_proj.lora_A.0.weight
model.layers.21.mlp.up_proj.lora_A.1.weight
model.layers.21.mlp.up_proj.lora_B.0.weight
model.layers.21.mlp.up_proj.lora_B.1.weight
model.layers.21.mlp.down_proj.lora_A.0.weight
model.layers.21.mlp.down_proj.lora_A.1.weight
model.layers.21.mlp.down_proj.lora_B.0.weight
model.layers.21.mlp.down_proj.lora_B.1.weight
model.layers.22.self_attn.q_proj.lora_A.0.weight
model.layers.22.self_attn.q_proj.lora_A.1.weight
model.layers.22.self_attn.q_proj.lora_B.0.weight
model.layers.22.self_attn.q_proj.lora_B.1.weight
model.layers.22.self_attn.k_proj.lora_A.0.weight
model.layers.22.self_attn.k_proj.lora_A.1.weight
model.layers.22.self_attn.k_proj.lora_B.0.weight
model.layers.22.self_attn.k_proj.lora_B.1.weight
model.layers.22.self_attn.v_proj.lora_A.0.weight
model.layers.22.self_attn.v_proj.lora_A.1.weight
model.layers.22.self_attn.v_proj.lora_B.0.weight
model.layers.22.self_attn.v_proj.lora_B.1.weight
model.layers.22.self_attn.o_proj.lora_A.0.weight
model.layers.22.self_attn.o_proj.lora_A.1.weight
model.layers.22.self_attn.o_proj.lora_B.0.weight
model.layers.22.self_attn.o_proj.lora_B.1.weight
model.layers.22.mlp.gate_proj.lora_A.0.weight
model.layers.22.mlp.gate_proj.lora_A.1.weight
model.layers.22.mlp.gate_proj.lora_B.0.weight
model.layers.22.mlp.gate_proj.lora_B.1.weight
model.layers.22.mlp.up_proj.lora_A.0.weight
model.layers.22.mlp.up_proj.lora_A.1.weight
model.layers.22.mlp.up_proj.lora_B.0.weight
model.layers.22.mlp.up_proj.lora_B.1.weight
model.layers.22.mlp.down_proj.lora_A.0.weight
model.layers.22.mlp.down_proj.lora_A.1.weight
model.layers.22.mlp.down_proj.lora_B.0.weight
model.layers.22.mlp.down_proj.lora_B.1.weight
model.layers.23.self_attn.q_proj.lora_A.0.weight
model.layers.23.self_attn.q_proj.lora_A.1.weight
model.layers.23.self_attn.q_proj.lora_B.0.weight
model.layers.23.self_attn.q_proj.lora_B.1.weight
model.layers.23.self_attn.k_proj.lora_A.0.weight
model.layers.23.self_attn.k_proj.lora_A.1.weight
model.layers.23.self_attn.k_proj.lora_B.0.weight
model.layers.23.self_attn.k_proj.lora_B.1.weight
model.layers.23.self_attn.v_proj.lora_A.0.weight
model.layers.23.self_attn.v_proj.lora_A.1.weight
model.layers.23.self_attn.v_proj.lora_B.0.weight
model.layers.23.self_attn.v_proj.lora_B.1.weight
model.layers.23.self_attn.o_proj.lora_A.0.weight
model.layers.23.self_attn.o_proj.lora_A.1.weight
model.layers.23.self_attn.o_proj.lora_B.0.weight
model.layers.23.self_attn.o_proj.lora_B.1.weight
model.layers.23.mlp.gate_proj.lora_A.0.weight
model.layers.23.mlp.gate_proj.lora_A.1.weight
model.layers.23.mlp.gate_proj.lora_B.0.weight
model.layers.23.mlp.gate_proj.lora_B.1.weight
model.layers.23.mlp.up_proj.lora_A.0.weight
model.layers.23.mlp.up_proj.lora_A.1.weight
model.layers.23.mlp.up_proj.lora_B.0.weight
model.layers.23.mlp.up_proj.lora_B.1.weight
model.layers.23.mlp.down_proj.lora_A.0.weight
model.layers.23.mlp.down_proj.lora_A.1.weight
model.layers.23.mlp.down_proj.lora_B.0.weight
model.layers.23.mlp.down_proj.lora_B.1.weight
model.layers.24.self_attn.q_proj.lora_A.0.weight
model.layers.24.self_attn.q_proj.lora_A.1.weight
model.layers.24.self_attn.q_proj.lora_B.0.weight
model.layers.24.self_attn.q_proj.lora_B.1.weight
model.layers.24.self_attn.k_proj.lora_A.0.weight
model.layers.24.self_attn.k_proj.lora_A.1.weight
model.layers.24.self_attn.k_proj.lora_B.0.weight
model.layers.24.self_attn.k_proj.lora_B.1.weight
model.layers.24.self_attn.v_proj.lora_A.0.weight
model.layers.24.self_attn.v_proj.lora_A.1.weight
model.layers.24.self_attn.v_proj.lora_B.0.weight
model.layers.24.self_attn.v_proj.lora_B.1.weight
model.layers.24.self_attn.o_proj.lora_A.0.weight
model.layers.24.self_attn.o_proj.lora_A.1.weight
model.layers.24.self_attn.o_proj.lora_B.0.weight
model.layers.24.self_attn.o_proj.lora_B.1.weight
model.layers.24.mlp.gate_proj.lora_A.0.weight
model.layers.24.mlp.gate_proj.lora_A.1.weight
model.layers.24.mlp.gate_proj.lora_B.0.weight
model.layers.24.mlp.gate_proj.lora_B.1.weight
model.layers.24.mlp.up_proj.lora_A.0.weight
model.layers.24.mlp.up_proj.lora_A.1.weight
model.layers.24.mlp.up_proj.lora_B.0.weight
model.layers.24.mlp.up_proj.lora_B.1.weight
model.layers.24.mlp.down_proj.lora_A.0.weight
model.layers.24.mlp.down_proj.lora_A.1.weight
model.layers.24.mlp.down_proj.lora_B.0.weight
model.layers.24.mlp.down_proj.lora_B.1.weight
model.layers.25.self_attn.q_proj.lora_A.0.weight
model.layers.25.self_attn.q_proj.lora_A.1.weight
model.layers.25.self_attn.q_proj.lora_B.0.weight
model.layers.25.self_attn.q_proj.lora_B.1.weight
model.layers.25.self_attn.k_proj.lora_A.0.weight
model.layers.25.self_attn.k_proj.lora_A.1.weight
model.layers.25.self_attn.k_proj.lora_B.0.weight
model.layers.25.self_attn.k_proj.lora_B.1.weight
model.layers.25.self_attn.v_proj.lora_A.0.weight
model.layers.25.self_attn.v_proj.lora_A.1.weight
model.layers.25.self_attn.v_proj.lora_B.0.weight
model.layers.25.self_attn.v_proj.lora_B.1.weight
model.layers.25.self_attn.o_proj.lora_A.0.weight
model.layers.25.self_attn.o_proj.lora_A.1.weight
model.layers.25.self_attn.o_proj.lora_B.0.weight
model.layers.25.self_attn.o_proj.lora_B.1.weight
model.layers.25.mlp.gate_proj.lora_A.0.weight
model.layers.25.mlp.gate_proj.lora_A.1.weight
model.layers.25.mlp.gate_proj.lora_B.0.weight
model.layers.25.mlp.gate_proj.lora_B.1.weight
model.layers.25.mlp.up_proj.lora_A.0.weight
model.layers.25.mlp.up_proj.lora_A.1.weight
model.layers.25.mlp.up_proj.lora_B.0.weight
model.layers.25.mlp.up_proj.lora_B.1.weight
model.layers.25.mlp.down_proj.lora_A.0.weight
model.layers.25.mlp.down_proj.lora_A.1.weight
model.layers.25.mlp.down_proj.lora_B.0.weight
model.layers.25.mlp.down_proj.lora_B.1.weight
model.layers.26.self_attn.q_proj.lora_A.0.weight
model.layers.26.self_attn.q_proj.lora_A.1.weight
model.layers.26.self_attn.q_proj.lora_B.0.weight
model.layers.26.self_attn.q_proj.lora_B.1.weight
model.layers.26.self_attn.k_proj.lora_A.0.weight
model.layers.26.self_attn.k_proj.lora_A.1.weight
model.layers.26.self_attn.k_proj.lora_B.0.weight
model.layers.26.self_attn.k_proj.lora_B.1.weight
model.layers.26.self_attn.v_proj.lora_A.0.weight
model.layers.26.self_attn.v_proj.lora_A.1.weight
model.layers.26.self_attn.v_proj.lora_B.0.weight
model.layers.26.self_attn.v_proj.lora_B.1.weight
model.layers.26.self_attn.o_proj.lora_A.0.weight
model.layers.26.self_attn.o_proj.lora_A.1.weight
model.layers.26.self_attn.o_proj.lora_B.0.weight
model.layers.26.self_attn.o_proj.lora_B.1.weight
model.layers.26.mlp.gate_proj.lora_A.0.weight
model.layers.26.mlp.gate_proj.lora_A.1.weight
model.layers.26.mlp.gate_proj.lora_B.0.weight
model.layers.26.mlp.gate_proj.lora_B.1.weight
model.layers.26.mlp.up_proj.lora_A.0.weight
model.layers.26.mlp.up_proj.lora_A.1.weight
model.layers.26.mlp.up_proj.lora_B.0.weight
model.layers.26.mlp.up_proj.lora_B.1.weight
model.layers.26.mlp.down_proj.lora_A.0.weight
model.layers.26.mlp.down_proj.lora_A.1.weight
model.layers.26.mlp.down_proj.lora_B.0.weight
model.layers.26.mlp.down_proj.lora_B.1.weight
model.layers.27.self_attn.q_proj.lora_A.0.weight
model.layers.27.self_attn.q_proj.lora_A.1.weight
model.layers.27.self_attn.q_proj.lora_B.0.weight
model.layers.27.self_attn.q_proj.lora_B.1.weight
model.layers.27.self_attn.k_proj.lora_A.0.weight
model.layers.27.self_attn.k_proj.lora_A.1.weight
model.layers.27.self_attn.k_proj.lora_B.0.weight
model.layers.27.self_attn.k_proj.lora_B.1.weight
model.layers.27.self_attn.v_proj.lora_A.0.weight
model.layers.27.self_attn.v_proj.lora_A.1.weight
model.layers.27.self_attn.v_proj.lora_B.0.weight
model.layers.27.self_attn.v_proj.lora_B.1.weight
model.layers.27.self_attn.o_proj.lora_A.0.weight
model.layers.27.self_attn.o_proj.lora_A.1.weight
model.layers.27.self_attn.o_proj.lora_B.0.weight
model.layers.27.self_attn.o_proj.lora_B.1.weight
model.layers.27.mlp.gate_proj.lora_A.0.weight
model.layers.27.mlp.gate_proj.lora_A.1.weight
model.layers.27.mlp.gate_proj.lora_B.0.weight
model.layers.27.mlp.gate_proj.lora_B.1.weight
model.layers.27.mlp.up_proj.lora_A.0.weight
model.layers.27.mlp.up_proj.lora_A.1.weight
model.layers.27.mlp.up_proj.lora_B.0.weight
model.layers.27.mlp.up_proj.lora_B.1.weight
model.layers.27.mlp.down_proj.lora_A.0.weight
model.layers.27.mlp.down_proj.lora_A.1.weight
model.layers.27.mlp.down_proj.lora_B.0.weight
model.layers.27.mlp.down_proj.lora_B.1.weight
model.layers.28.self_attn.q_proj.lora_A.0.weight
model.layers.28.self_attn.q_proj.lora_A.1.weight
model.layers.28.self_attn.q_proj.lora_B.0.weight
model.layers.28.self_attn.q_proj.lora_B.1.weight
model.layers.28.self_attn.k_proj.lora_A.0.weight
model.layers.28.self_attn.k_proj.lora_A.1.weight
model.layers.28.self_attn.k_proj.lora_B.0.weight
model.layers.28.self_attn.k_proj.lora_B.1.weight
model.layers.28.self_attn.v_proj.lora_A.0.weight
model.layers.28.self_attn.v_proj.lora_A.1.weight
model.layers.28.self_attn.v_proj.lora_B.0.weight
model.layers.28.self_attn.v_proj.lora_B.1.weight
model.layers.28.self_attn.o_proj.lora_A.0.weight
model.layers.28.self_attn.o_proj.lora_A.1.weight
model.layers.28.self_attn.o_proj.lora_B.0.weight
model.layers.28.self_attn.o_proj.lora_B.1.weight
model.layers.28.mlp.gate_proj.lora_A.0.weight
model.layers.28.mlp.gate_proj.lora_A.1.weight
model.layers.28.mlp.gate_proj.lora_B.0.weight
model.layers.28.mlp.gate_proj.lora_B.1.weight
model.layers.28.mlp.up_proj.lora_A.0.weight
model.layers.28.mlp.up_proj.lora_A.1.weight
model.layers.28.mlp.up_proj.lora_B.0.weight
model.layers.28.mlp.up_proj.lora_B.1.weight
model.layers.28.mlp.down_proj.lora_A.0.weight
model.layers.28.mlp.down_proj.lora_A.1.weight
model.layers.28.mlp.down_proj.lora_B.0.weight
model.layers.28.mlp.down_proj.lora_B.1.weight
model.layers.29.self_attn.q_proj.lora_A.0.weight
model.layers.29.self_attn.q_proj.lora_A.1.weight
model.layers.29.self_attn.q_proj.lora_B.0.weight
model.layers.29.self_attn.q_proj.lora_B.1.weight
model.layers.29.self_attn.k_proj.lora_A.0.weight
model.layers.29.self_attn.k_proj.lora_A.1.weight
model.layers.29.self_attn.k_proj.lora_B.0.weight
model.layers.29.self_attn.k_proj.lora_B.1.weight
model.layers.29.self_attn.v_proj.lora_A.0.weight
model.layers.29.self_attn.v_proj.lora_A.1.weight
model.layers.29.self_attn.v_proj.lora_B.0.weight
model.layers.29.self_attn.v_proj.lora_B.1.weight
model.layers.29.self_attn.o_proj.lora_A.0.weight
model.layers.29.self_attn.o_proj.lora_A.1.weight
model.layers.29.self_attn.o_proj.lora_B.0.weight
model.layers.29.self_attn.o_proj.lora_B.1.weight
model.layers.29.mlp.gate_proj.lora_A.0.weight
model.layers.29.mlp.gate_proj.lora_A.1.weight
model.layers.29.mlp.gate_proj.lora_B.0.weight
model.layers.29.mlp.gate_proj.lora_B.1.weight
model.layers.29.mlp.up_proj.lora_A.0.weight
model.layers.29.mlp.up_proj.lora_A.1.weight
model.layers.29.mlp.up_proj.lora_B.0.weight
model.layers.29.mlp.up_proj.lora_B.1.weight
model.layers.29.mlp.down_proj.lora_A.0.weight
model.layers.29.mlp.down_proj.lora_A.1.weight
model.layers.29.mlp.down_proj.lora_B.0.weight
model.layers.29.mlp.down_proj.lora_B.1.weight
model.layers.30.self_attn.q_proj.lora_A.0.weight
model.layers.30.self_attn.q_proj.lora_A.1.weight
model.layers.30.self_attn.q_proj.lora_B.0.weight
model.layers.30.self_attn.q_proj.lora_B.1.weight
model.layers.30.self_attn.k_proj.lora_A.0.weight
model.layers.30.self_attn.k_proj.lora_A.1.weight
model.layers.30.self_attn.k_proj.lora_B.0.weight
model.layers.30.self_attn.k_proj.lora_B.1.weight
model.layers.30.self_attn.v_proj.lora_A.0.weight
model.layers.30.self_attn.v_proj.lora_A.1.weight
model.layers.30.self_attn.v_proj.lora_B.0.weight
model.layers.30.self_attn.v_proj.lora_B.1.weight
model.layers.30.self_attn.o_proj.lora_A.0.weight
model.layers.30.self_attn.o_proj.lora_A.1.weight
model.layers.30.self_attn.o_proj.lora_B.0.weight
model.layers.30.self_attn.o_proj.lora_B.1.weight
model.layers.30.mlp.gate_proj.lora_A.0.weight
model.layers.30.mlp.gate_proj.lora_A.1.weight
model.layers.30.mlp.gate_proj.lora_B.0.weight
model.layers.30.mlp.gate_proj.lora_B.1.weight
model.layers.30.mlp.up_proj.lora_A.0.weight
model.layers.30.mlp.up_proj.lora_A.1.weight
model.layers.30.mlp.up_proj.lora_B.0.weight
model.layers.30.mlp.up_proj.lora_B.1.weight
model.layers.30.mlp.down_proj.lora_A.0.weight
model.layers.30.mlp.down_proj.lora_A.1.weight
model.layers.30.mlp.down_proj.lora_B.0.weight
model.layers.30.mlp.down_proj.lora_B.1.weight
model.layers.31.self_attn.q_proj.lora_A.0.weight
model.layers.31.self_attn.q_proj.lora_A.1.weight
model.layers.31.self_attn.q_proj.lora_B.0.weight
model.layers.31.self_attn.q_proj.lora_B.1.weight
model.layers.31.self_attn.k_proj.lora_A.0.weight
model.layers.31.self_attn.k_proj.lora_A.1.weight
model.layers.31.self_attn.k_proj.lora_B.0.weight
model.layers.31.self_attn.k_proj.lora_B.1.weight
model.layers.31.self_attn.v_proj.lora_A.0.weight
model.layers.31.self_attn.v_proj.lora_A.1.weight
model.layers.31.self_attn.v_proj.lora_B.0.weight
model.layers.31.self_attn.v_proj.lora_B.1.weight
model.layers.31.self_attn.o_proj.lora_A.0.weight
model.layers.31.self_attn.o_proj.lora_A.1.weight
model.layers.31.self_attn.o_proj.lora_B.0.weight
model.layers.31.self_attn.o_proj.lora_B.1.weight
model.layers.31.mlp.gate_proj.lora_A.0.weight
model.layers.31.mlp.gate_proj.lora_A.1.weight
model.layers.31.mlp.gate_proj.lora_B.0.weight
model.layers.31.mlp.gate_proj.lora_B.1.weight
model.layers.31.mlp.up_proj.lora_A.0.weight
model.layers.31.mlp.up_proj.lora_A.1.weight
model.layers.31.mlp.up_proj.lora_B.0.weight
model.layers.31.mlp.up_proj.lora_B.1.weight
model.layers.31.mlp.down_proj.lora_A.0.weight
model.layers.31.mlp.down_proj.lora_A.1.weight
model.layers.31.mlp.down_proj.lora_B.0.weight
model.layers.31.mlp.down_proj.lora_B.1.weight
----------------------

And then, just after that, I run the SFTTrainer, which prints, exactly:

Using auto half precision backend
Currently training with a batch size of: 2
***** Running training *****
  Num examples = 1,053
  Num Epochs = 1
  Instantaneous batch size per device = 2
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 16
  Total optimization steps = 32
  Number of trainable parameters = 118,372,800
Detected flash_attn version: 2.6.3
/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:91: UserWarning: None of the inputs have requires_grad=True. Gradients will be None

benjamin-marie avatar Aug 19 '24 15:08 benjamin-marie

Thanks @benjamin-marie. The internal_xlora_classifier does not appear among the trainable parameters, whereas the LoRAs should be frozen, right @EricLBuehler?

BenjaminBossan avatar Aug 19 '24 16:08 BenjaminBossan

Yes, exactly. I'll try to reproduce and fix this!

EricLBuehler avatar Aug 19 '24 16:08 EricLBuehler

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions[bot] avatar Sep 17 '24 15:09 github-actions[bot]

Not stale

EricLBuehler avatar Sep 17 '24 15:09 EricLBuehler

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions[bot] avatar Oct 12 '24 15:10 github-actions[bot]