[Feat]: LoKr Support
Describe your use-case.
https://arxiv.org/pdf/2309.14859
User reports from using other trainers:
Vram/performance cost is on par with lora training, while having better quality than traditional LoRa's while not having the 30% extra performance penalty due to extra maths required from DoRA, its inbetween the two. In very large runs (>60k) Lycoris/lokr (and DoRa) is where users claim it hits a sweet spot.
KohakuBlueleaf uses it for his projects (His finetune Blueleaf was the basis for Illustrious). Most that have tried it, do vouch for its improvementsso far, another user mentioned not seeing an improvement in his tests - but neither was it in any way worse than peft (he mostly uses very smaller datasets. sub 30 images.)
Lokr loras are standard on simpletuner discord. for me, it has proved itself over and over and i only do lora's with lokr( with the exception of sdxl.)
Examples:
- this is just for sd 3.5.
- the results for flux is insane.
What would you like to see as a solution?
Necessary steps:
- write a small guide on hyperparameters etc.
- mathematical implementation seems rather trivial. Maybe 30 lines in LoRAModule.py, as it's already well-abstracted
- what's necessary to generate LoKr-files that are supported by popular inference tools?
Have you considered alternatives? List them here.
LoRA, LoHA
i tried LoKr on kohya for flux . it has better multi concept training, still far from non-bleeding, but lower quality in my experiments
i tried LoKr on kohya for flux . it has better multi concept training, still far from non-bleeding, but lower quality in my experiments
I believe and have found from my own testing that Kohyas LoKr implementation is inferior to simpletuner and ai-toolkit.
i tried LoKr on kohya for flux . it has better multi concept training, still far from non-bleeding, but lower quality in my experiments
I believe and have found from my own testing that Kohyas LoKr implementation is inferior to simpletuner and ai-toolkit.
could be.
i tried LoKr on kohya for flux . it has better multi concept training, still far from non-bleeding, but lower quality in my experiments
I believe and have found from my own testing that Kohyas LoKr implementation is inferior to simpletuner and ai-toolkit.
could be.
those sd 3.5 images (large and medium ) lokr loras are mine on the discord. from ai-toolkit, sir, they don't look lower quality to me..and its sd 3.5! if you only use koyha, what you say makes sense. bleeding is stopped with proper dataset and captions to remove it.
reference implementation with key suffixes that seem to be supported by inference tools: https://github.com/KohakuBlueleaf/LyCORIS/blob/main/lycoris/modules/lokr.py
i tried LoKr on kohya for flux . it has better multi concept training, still far from non-bleeding, but lower quality in my experiments
I believe and have found from my own testing that Kohyas LoKr implementation is inferior to simpletuner and ai-toolkit.
Any insight on why that might be? They both seem to be using just the LyCORIS code. The only thing I've found SimpleTuner doing in addition is initialize the network differently: https://github.com/bghira/SimpleTuner/blob/d107b2bc17618fc64c5ef7f5478ccd64148a51e5/helpers/training/trainer.py#L729
How are you using this parameter init_lokr_norm in practice? What impact does it have?
i tried LoKr on kohya for flux . it has better multi concept training, still far from non-bleeding, but lower quality in my experiments
I believe and have found from my own testing that Kohyas LoKr implementation is inferior to simpletuner and ai-toolkit.
Any insight on why that might be? They both seem to be using just the LyCORIS code. The only thing I've found SimpleTuner doing in addition is initialize the network differently: https://github.com/bghira/SimpleTuner/blob/d107b2bc17618fc64c5ef7f5478ccd64148a51e5/helpers/training/trainer.py#L729
How are you using this parameter init_lokr_norm in practice? What impact does it have?
I think its not so much lycoris lib as it it's koyha the trainer. I see better results in simpletuner and ai toolkit. Why is that ? I don't have a clue..
would be pretty nice to have lokr support, kohyas implementation is okay, simple tuners seems pretty well done.
Need to be able to load in a custom lokr config as well
which you can control the factors for each module and overall module
https://github.com/bghira/SimpleTuner/blob/main/config/lycoris_config.json.example
https://github.com/KohakuBlueleaf/LyCORIS
https://github.com/KohakuBlueleaf/LyCORIS/blob/main/lycoris/modules/lokr.py
Can anyone who uses LoKr provide your smallest-possible testcase that shows LoKr working as you expect?
Some public dataset and config that, ideally, works with LoKr but fails with standard LoRA. This would both serve as motivation to implement LoKr into OneTrainer, and as a testcase to test the implementation.
Caith had made a lokr before, i stuggle with concepts bleeding with regular lora training,
https://civitai.com/models/714292/juno-for-flux-by-caith-overwatch-2
I think in general it works better. Here are some keypoints
- Less prone to overfitting
- File size controllability with factor size.
- Also when using bypass_mode it saves vram and trains faster and better with batch size.
- Multi character and concept training, i found it better when using with prior model predication and when training two characters together
- Full matrix mode i think its similar to using full finetuning with dim 10000
Also im not sure about conv_dim and conv_alpha
works good to train those on sdxl , not sure about flux but would be good to have
I’m going stress Dxqs point here, to convince Nero to merge a resulting PR you need to provide a reproducible dataset (configs) and/or conclusive experiment where it’s clearly demonstrated with evidence that LoKr works or large improvement where traditional does not, from past comments isn’t convinced by anecdotes.
This means a huge amount of sweeps with traditional and lokr and many samples. Lokr has come up in the past and not a single person has been able to to provide conclusive proof for Nero to observe. DoRA had proof, that’s why it got merged
P.S I am not making a judgement on the effectiveness of Lokr, just explaining on how to maximise chances. Please try to keep noise in this thread, hearsay or anecdotal stuff to a minimum as it’s not helpful.
I wouldn't put it as strongly, because of many anecdotal reports that LoKr works better - among them clearly very experienced trainers. A testcase and your hyperparameters will still be useful.