DeepSpeed
DeepSpeed copied to clipboard
[BUG] Fails to finetune certain subset of parameters via torch.optim.AdamW code (not .json setting)
I have to add one more LoRA layer by hand(without peft) to a pretrained Multi-modal model, to finetune the model for new data. I want Deepspeed to optimize ONLY the parameters from the LoRA layer rather than all the parameters. Like this
The platform is huggingface's transformers and Deepspeed.
Therefore I decorate the Trainer from HF's transformers,as below:
Unfortuanately, it doesn't work, both LoRA and non-LoRa's weights are not changed during training. It seems that the optimizer in Deepspeed is not the same as that from pytorch.
My question is , are there any ways that allow me to ONLY finetune certain subnet's parameters with Deepspeed+Transformer's Trainer?