[BUG] Fails to finetune certain subset of parameters via torch.optim.AdamW code (not .json setting)

Open JasonLeeFdu opened this issue 1 year ago • 0 comments

I have to add one more LoRA layer by hand(without peft) to a pretrained Multi-modal model, to finetune the model for new data. I want Deepspeed to optimize ONLY the parameters from the LoRA layer rather than all the parameters. Like this

The platform is huggingface's transformers and Deepspeed.

Therefore I decorate the Trainer from HF's transformers,as below:

Unfortuanately, it doesn't work, both LoRA and non-LoRa's weights are not changed during training. It seems that the optimizer in Deepspeed is not the same as that from pytorch.

My question is , are there any ways that allow me to ONLY finetune certain subnet's parameters with Deepspeed+Transformer's Trainer?

Apr 30 '24 17:04 JasonLeeFdu