peft icon indicating copy to clipboard operation
peft copied to clipboard

[FEAT] Integrate WoRA (Filtering-WoRA) into PEFT

Open JT-Sun opened this issue 6 months ago • 7 comments

Feature request

Paper: https://arxiv.org/pdf/2404.10292 Reference code: https://github.com/JT-Sun/Filtering-WoRA

Motivation (WoRA) We propose integrating WoRA as a new PEFT adapter to provide a lightweight, mergeable way to (1) learn a weighted direction that blends pretrained weights with a low-rank update and (2) decouple magnitude from direction for stability and control, as described in the WWW25 accepted paper.

Key innovations:

  • Weighted direction (learnable α, β). Learn the trade-off between the base direction and the low-rank update before normalization.
  • Normalized update. Column-wise normalization of the combined direction yields stable optimization.
  • Drop-in & mergeable. Same training/inference ergonomics as LoRA-style adapters; merge/unmerge is supported.

Method (WoRA) We follow DoRA’s magnitude–direction decoupling (m is not our contribution). Our contribution is the weighted direction with learnable α,β before normalization. Image

Figures WoRA methodology:

Image

Geometric view (weighted direction with α, β):

Image

Compare the performance: Image

Image

Your contribution

The implementation in https://github.com/JT-Sun/Filtering-WoRA is based on peft, and we would be pleased submit a pull request, welcoming any suggestions or guidance on this.

JT-Sun avatar Oct 23 '25 02:10 JT-Sun

Thanks for proposing to add WoRA. IIUC, this would be strictly focused on the PEFT method, not the data curation described in the paper. The PEFT method is basically DoRA with two additional, learnable scalars that are applied to W_0 and BA. As such, it should be relatively easy to add as a LoRA variant. Regarding the linked repo, I couldn't find the WoRA code in there, could you please give me a pointer?

BenjaminBossan avatar Oct 23 '25 09:10 BenjaminBossan

Thanks for the quick review! Yes—this proposal is strictly about the PEFT method, not the data-curation part of the paper. Sorry for the inadequate link. Here is our WoRA code: https://github.com/JT-Sun/Filtering-WoRA/blob/main/models/swin_transformer.py (look for classes WoRALayer, LinearWithWoRA, LinearWithWoRAMerged)

  • α and β are learnable (scalars in this implementation; can be extended to per-head/per-channel in PEFT). LoRA/DoRA are special cases: fixing α, β recovers them, but WoRA learns the trade-off with negligible extra params.
  • The learnable (α,β) lets the adapter move further from or closer to Wo as needed, with stable optimization thanks to column-wise normalization—while staying PEFT-light (two small parameters per adapted module + the usual low-rank branch).

If this direction looks good, I can open a PR to add that code. Thanks again!

JT-Sun avatar Oct 23 '25 12:10 JT-Sun

Thanks for clarifying. The direction looks good. Implementation-wise, I would suggest to implement this as a "LoRA variant", the same way that DoRA is implemented.

BenjaminBossan avatar Oct 23 '25 16:10 BenjaminBossan

Hi @BenjaminBossan and @JT-Sun,

First off, thank you @JT-Sun for your detailed and insightful work on improving DoRA — it’s truly inspiring and empirically well-grounded. While going through your approach, I found it so compelling that I decided to implement it myself to better understand the nuances, and have now raised a PR reflecting that work.

I noticed you mentioned possibly raising a PR for the same improvement. If you haven’t already begun or would be open to collaborating, I’d love to join efforts and refine this together. If, however, you already have an implementation in progress, I’d be happy to close my PR and help push your version forward instead.

Either way, I deeply value the work you’ve done and would appreciate any feedback or opportunity to contribute collaboratively.

sambhavnoobcoder avatar Oct 26 '25 20:10 sambhavnoobcoder

Thanks @sambhavnoobcoder for implementing the WoRA integration. However, in the future, please ask first if the other person has already started working, as otherwise there could be a lot of duplicate effort.

Let's wait for @JT-Sun's reply, if they have a separate PR, I would give it priority as they proposed it first and are author of the paper. Still, your contribution is appreciated, Sambhav.

BenjaminBossan avatar Oct 27 '25 11:10 BenjaminBossan

Thanks, @BenjaminBossan — I agree with prioritizing the original authors’ work. I implemented WoRA as an academic exercise to better understand the paper and pushed a PR here; to avoid duplicate effort I won’t push any further changes, I’ll close my PR if the original authors open their own implementation .

sambhavnoobcoder avatar Oct 27 '25 17:10 sambhavnoobcoder

@JT-Sun did you have time to check this yet?

BenjaminBossan avatar Nov 14 '25 10:11 BenjaminBossan