How to fine tune unstructured sparse models with LoRA?
It's a simple but great work! As the question shows, I don't quite understand how to fine-tune by LoRA for the model after unstructured pruning. Is it fine-tuned in a form similar to W=W+mAB, where m is the mask matrix of (C_out, C_in)?My main concern is that the additional weights of LoRA will significantly reduce the sparsity of the model. If I've misunderstood anything, I'm very much looking forward to your corrections!
We keep the LoRA adapter separate from the main network, which is a common option for LoRA fine-tuning. The computational overhead induced by the separate computation is minimal since it is a lightweight adapter.
I have the same question. Yes, the separate computation for LoRA is minimal. However, in the inference phase, the weights of LoRA will be added to the pruned weights. Therefore, the additional weights of LoRA will significantly reduce the sparsity of the model. I guess that @an-yongqi was asking for this problem.
can we get the code for lora fine tune?
I have the same question. Yes, the separate computation for LoRA is minimal. However, in the inference phase, the weights of LoRA will be added to the pruned weights. Therefore, the additional weights of LoRA will significantly reduce the sparsity of the model. I guess that @an-yongqi was asking for this problem.
That's a good point. However, the application of LoRA does not inevitably require the addition of the product of A and B to the pretrained weights of a large language model. One could instead multiply the activation with A and B subsequently, and then add this result to h. This approach adheres to the original LoRA concept, and while it could introduce some runtime computation overhead, it should not drastically affect the sparsity of the entire model.
We keep the LoRA adapter separate from the main network, which is a common option for LoRA fine-tuning. The computational overhead induced by the separate computation is minimal since it is a lightweight adapter.
Did the author use the unstructured pruning mask to fine-tune the pruned model?