InvokeAI
InvokeAI copied to clipboard
Apply lora by model patching
Rewrite lora to be applied by model patching as it gives us benefits:
- On model execution calculates result only on model weight, while with hooks we need to calculate on model and each lora
- As lora now patched in model weights, there no need to store lora in vram
Results: Speed:
| loras count | hook | patch |
|---|---|---|
| 0 | ~4.92 it/s | ~4.92 it/s |
| 1 | ~3.51 it/s | ~4.89 it/s |
| 2 | ~2.76 it/s | ~4.92 it/s |
VRAM:
| loras count | hook | patch |
|---|---|---|
| 0 | ~3.6 gb | ~3.6 gb |
| 1 | ~4.0 gb | ~3.6 gb |
| 2 | ~4.4 gb | ~3.7 gb |
As based on #3547 wait to merge.