rattus
rattus
Pretty much every error cudaHostRegister can throw also queues the same error on the async GPU queue. This was fixed for repinning error case, but there is the bad mmap...
Add accounting for the VRAM cost of weight offloading to avoid VRAM OOMs that occur due to the offload process having to buffer and manipulate weights. This is particularly a...
Draft of a generic module prefetcher. Implement the core feature and give one example of how to use it with QWEN. This is able to get very close to compute...
This PR expands the robustness of the RAM cache implementation. This makes the RAM cache much friendlier to use and avoids users needing to specifically size the cache based on...
slow down the CPU on model load to not run ahead. This fixes a VRAM on flux 2 load. I went to try and debug this with the memory trace...
This is the hopefully full root cause fix on: https://github.com/comfyanonymous/ComfyUI/issues/10891 Primary commit message: ``` commit 53bd09926cf0f680d0fd67afcb2d0a289d71940d Author: Rattus Date: Sun Dec 7 21:23:05 2025 +1000 Account for dequantization and type-casts...
In the lowvram case, this now does its math in the model dtype post de-quantization. Account for that. The patching was also put back on the compute stream getting it...
merge back v0.3.77 to the master without code change This is a history only merge. This will allow 0.3.77 users to git pull and get moved straight to v 0.4.x...
This operation does a torch.cat in latents which in --gpu-only may be out of the GPU. The two VAE results will follow the --gpu-only defined behaviour so follow the inpaint...