[FEAT] Integrate WoRA (Filtering-WoRA) into PEFT
Feature request
Paper: https://arxiv.org/pdf/2404.10292 Reference code: https://github.com/JT-Sun/Filtering-WoRA
Motivation (WoRA) We propose integrating WoRA as a new PEFT adapter to provide a lightweight, mergeable way to (1) learn a weighted direction that blends pretrained weights with a low-rank update and (2) decouple magnitude from direction for stability and control, as described in the WWW25 accepted paper.
Key innovations:
- Weighted direction (learnable α, β). Learn the trade-off between the base direction and the low-rank update before normalization.
- Normalized update. Column-wise normalization of the combined direction yields stable optimization.
- Drop-in & mergeable. Same training/inference ergonomics as LoRA-style adapters; merge/unmerge is supported.
Method (WoRA)
We follow DoRA’s magnitude–direction decoupling (m is not our contribution). Our contribution is the weighted direction with learnable α,β before normalization.
Figures WoRA methodology:
Geometric view (weighted direction with α, β):
Compare the performance:
Your contribution
The implementation in https://github.com/JT-Sun/Filtering-WoRA is based on peft, and we would be pleased submit a pull request, welcoming any suggestions or guidance on this.
Thanks for proposing to add WoRA. IIUC, this would be strictly focused on the PEFT method, not the data curation described in the paper. The PEFT method is basically DoRA with two additional, learnable scalars that are applied to W_0 and BA. As such, it should be relatively easy to add as a LoRA variant. Regarding the linked repo, I couldn't find the WoRA code in there, could you please give me a pointer?
Thanks for the quick review! Yes—this proposal is strictly about the PEFT method, not the data-curation part of the paper. Sorry for the inadequate link. Here is our WoRA code: https://github.com/JT-Sun/Filtering-WoRA/blob/main/models/swin_transformer.py (look for classes WoRALayer, LinearWithWoRA, LinearWithWoRAMerged)
- α and β are learnable (scalars in this implementation; can be extended to per-head/per-channel in PEFT). LoRA/DoRA are special cases: fixing α, β recovers them, but WoRA learns the trade-off with negligible extra params.
- The learnable (α,β) lets the adapter move further from or closer to Wo as needed, with stable optimization thanks to column-wise normalization—while staying PEFT-light (two small parameters per adapted module + the usual low-rank branch).
If this direction looks good, I can open a PR to add that code. Thanks again!
Thanks for clarifying. The direction looks good. Implementation-wise, I would suggest to implement this as a "LoRA variant", the same way that DoRA is implemented.
Hi @BenjaminBossan and @JT-Sun,
First off, thank you @JT-Sun for your detailed and insightful work on improving DoRA — it’s truly inspiring and empirically well-grounded. While going through your approach, I found it so compelling that I decided to implement it myself to better understand the nuances, and have now raised a PR reflecting that work.
I noticed you mentioned possibly raising a PR for the same improvement. If you haven’t already begun or would be open to collaborating, I’d love to join efforts and refine this together. If, however, you already have an implementation in progress, I’d be happy to close my PR and help push your version forward instead.
Either way, I deeply value the work you’ve done and would appreciate any feedback or opportunity to contribute collaboratively.
Thanks @sambhavnoobcoder for implementing the WoRA integration. However, in the future, please ask first if the other person has already started working, as otherwise there could be a lot of duplicate effort.
Let's wait for @JT-Sun's reply, if they have a separate PR, I would give it priority as they proposed it first and are author of the paper. Still, your contribution is appreciated, Sambhav.
Thanks, @BenjaminBossan — I agree with prioritizing the original authors’ work. I implemented WoRA as an academic exercise to better understand the paper and pushed a PR here; to avoid duplicate effort I won’t push any further changes, I’ll close my PR if the original authors open their own implementation .
@JT-Sun did you have time to check this yet?