ComfyUI
ComfyUI copied to clipboard
retune lowVramPatch VRAM accounting
In the lowvram case, this now does its math in the model dtype post de-quantization. Account for that. The patching was also put back on the compute stream getting it off-peak so relax the MATH_FACTOR to only x2 so get out of the worst-case assumption of everything peaking at once.
RTX3060, flux2 fp8 with Lora:
Before:
After: