wanda icon indicating copy to clipboard operation
wanda copied to clipboard

How to fine tune unstructured sparse models with LoRA?

Open an-yongqi opened this issue 2 years ago • 5 comments

It's a simple but great work! As the question shows, I don't quite understand how to fine-tune by LoRA for the model after unstructured pruning. Is it fine-tuned in a form similar to W=W+mAB, where m is the mask matrix of (C_out, C_in)?My main concern is that the additional weights of LoRA will significantly reduce the sparsity of the model. If I've misunderstood anything, I'm very much looking forward to your corrections!

an-yongqi avatar Jun 28 '23 12:06 an-yongqi

We keep the LoRA adapter separate from the main network, which is a common option for LoRA fine-tuning. The computational overhead induced by the separate computation is minimal since it is a lightweight adapter.

Eric-mingjie avatar Jun 29 '23 10:06 Eric-mingjie

I have the same question. Yes, the separate computation for LoRA is minimal. However, in the inference phase, the weights of LoRA will be added to the pruned weights. Therefore, the additional weights of LoRA will significantly reduce the sparsity of the model. I guess that @an-yongqi was asking for this problem.

kiseliu avatar Aug 02 '23 15:08 kiseliu

can we get the code for lora fine tune?

pkulium avatar Aug 29 '23 19:08 pkulium

I have the same question. Yes, the separate computation for LoRA is minimal. However, in the inference phase, the weights of LoRA will be added to the pruned weights. Therefore, the additional weights of LoRA will significantly reduce the sparsity of the model. I guess that @an-yongqi was asking for this problem.

That's a good point. However, the application of LoRA does not inevitably require the addition of the product of A and B to the pretrained weights of a large language model. One could instead multiply the activation with A and B subsequently, and then add this result to h. This approach adheres to the original LoRA concept, and while it could introduce some runtime computation overhead, it should not drastically affect the sparsity of the entire model.

azurespace avatar Nov 06 '23 10:11 azurespace

We keep the LoRA adapter separate from the main network, which is a common option for LoRA fine-tuning. The computational overhead induced by the separate computation is minimal since it is a lightweight adapter.

Did the author use the unstructured pruning mask to fine-tune the pruned model?

wyxscir avatar Jan 26 '24 03:01 wyxscir