rattus issues

Results 9 issues of


                                            rattus

mm: discard async errors from pinning failures

Pretty much every error cudaHostRegister can throw also queues the same error on the async GPU queue. This was fixed for repinning error case, but there is the bad mmap...

Account for the VRAM cost of weight offloading (after next stable)

Add accounting for the VRAM cost of weight offloading to avoid VRAM OOMs that occur due to the offload process having to buffer and manipulate weights. This is particularly a...

Implement asynchronous module prefetching (QWEN+WAN so far)

Draft of a generic module prefetcher. Implement the core feature and give one example of how to use it with QWEN. This is able to get very close to compute...

RAM cache implementation - part II

This PR expands the robustness of the RAM cache implementation. This makes the RAM cache much friendlier to use and avoids users needing to specifically size the cache based on...

Core

Fix on-load VRAM OOM

slow down the CPU on model load to not run ahead. This fixes a VRAM on flux 2 load. I went to try and debug this with the memory trace...

dequantization offload accounting (fixes Flux2 OOMs - incl TEs)

This is the hopefully full root cause fix on: https://github.com/comfyanonymous/ComfyUI/issues/10891 Primary commit message: ``` commit 53bd09926cf0f680d0fd67afcb2d0a289d71940d Author: Rattus Date: Sun Dec 7 21:23:05 2025 +1000 Account for dequantization and type-casts...

rattus

mm: discard async errors from pinning failures

Account for the VRAM cost of weight offloading (after next stable)

Implement asynchronous module prefetching (QWEN+WAN so far)

RAM cache implementation - part II

Fix on-load VRAM OOM

dequantization offload accounting (fixes Flux2 OOMs - incl TEs)

retune lowVramPatch VRAM accounting

v0.3.77 merge back

ZImageFunControlNet: Fix mask concatenation in --gpu-only