After updating to V0.3.59 of confui, some nodes such as VAE encoding and decoding, as well as image zooming, have become extremely slow and laggy.
Custom Node Testing
- [ ] I have tried disabling custom nodes and the issue persists (see how to disable custom nodes if you need help)
Your question
After the update to version V0.3.59, the nodes for VAE encoding and decoding, as well as image enlargement, consume significantly high amounts of video memory and operate very slowly, whereas they functioned normally in versions V0.3.57 and below. RTX4080 with 64GB of memory.
Thank you very much. I hope to get some help
Logs
Other
No response
I'm struggling with this as well. All of my current workflows are broken. I have a 4090 and can no longer process any sdxl images. It's spiking immediately up to 24gig of vram and failing with VAE trying to switch to tiled mode. I installed of 2nd copy of comfyui with the Comfyui Easy Installer for pixorama and had the same problem. What I found is removing the rgthree node makes things functional again, just none of my workflows. Default simple comfyui workflows work properly.
I'm struggling with this as well. All of my current workflows are broken. I have a 4090 and can no longer process any sdxl images. It's spiking immediately up to 24gig of vram and failing with VAE trying to switch to tiled mode. I installed of 2nd copy of comfyui with the Comfyui Easy Installer for pixorama and had the same problem. What I found is removing the rgthree node makes things functional again, just none of my workflows. Default simple comfyui workflows work properly.
Yes, especially VAE encoding and decoding, which also consume a lot of video memory and are 10 times slower than before
Same issue, so I added nodes like 'clean vram used, set reserved vram ,delay' to make the workflow run normally.
I found the reason, it was caused by 'llama-cpp-python'. I just need to delete 'ComfyUI-JoyCaption' and 'ComfyUI-MiniCPM' plugins, because they use 'llama_cpp_install.py'. Deleting them will restore to normal. It has nothing to do with comfyui itself.
I found the reason, it was caused by 'llama-cpp-python'. I just need to delete 'ComfyUI-JoyCaption' and 'ComfyUI-MiniCPM' plugins, because they use 'llama_cpp_install.py'. Deleting them will restore to normal. It has nothing to do with comfyui itself.
Deleting these two plugins still didn't work. They still consume a lot of VRAM and are very slow. I have reverted back to version 0.3.56
By the way, I have 16G VRAM, and I have successfully tested it on pytorch version: 2.7.1+cu126 and pytorch version: 2.6+cu126.
By the way, I have 16G VRAM, and I have successfully tested it on pytorch version: 2.7.1+cu126 and pytorch version: 2.6+cu126.
I have an RTX4080 with 16GB + 64GB of memory Python version: 3.12.3 pytorch version: 2.7.0+cu128
Just like you. Back in the past, processing images of this size wouldn't take more than a second on 5090. Now I can only roll back to the previous version of Comfy to fix the problem.
Can you try after disabling custom nodes? This will help identify whether the issue should be addressed here or somewhere else.
I'm experiencing the same issue. After upgrading from v0.3.56 to v0.3.57, the VAE decode process has become extremely slow and the CUDA utilization is very low
I found the reason, it was caused by 'llama-cpp-python'. I just need to delete 'ComfyUI-JoyCaption' and 'ComfyUI-MiniCPM' plugins, because they use 'llama_cpp_install.py'. Deleting them will restore to normal. It has nothing to do with comfyui itself.
I installed JoyCaption yesterday and now run into that issue, after disabling JoyCaption again it works as before. Other than VAE, which let the VRAM usage explode it also heavily impacts Ultralytics BBOX models. Very strange behavior. If anyone has an explanation it would appreciated.
I found the reason, it was caused by 'llama-cpp-python'. I just need to delete 'ComfyUI-JoyCaption' and 'ComfyUI-MiniCPM' plugins, because they use 'llama_cpp_install.py'. Deleting them will restore to normal. It has nothing to do with comfyui itself.
I installed JoyCaption yesterday and now run into that issue, after disabling JoyCaption again it works as before. Other than VAE, which let the VRAM usage explode it also heavily impacts Ultralytics BBOX models. Very strange behavior. If anyone has an explanation it would appreciated.
You can install other versions without using 'llama_cpp_install.py'.
I found the reason, it was caused by 'llama-cpp-python'. I just need to delete 'ComfyUI-JoyCaption' and 'ComfyUI-MiniCPM' plugins, because they use 'llama_cpp_install.py'. Deleting them will restore to normal. It has nothing to do with comfyui itself.
I installed JoyCaption yesterday and now run into that issue, after disabling JoyCaption again it works as before. Other than VAE, which let the VRAM usage explode it also heavily impacts Ultralytics BBOX models. Very strange behavior. If anyone has an explanation it would appreciated.
You can install other versions without using 'llama_cpp_install.py'.
This is the problem with comfui itself. Don't look for the problem. It has happened before. Why can V0.3.56 work normally? Don't you think so!
I found the reason, it was caused by 'llama-cpp-python'. I just need to delete 'ComfyUI-JoyCaption' and 'ComfyUI-MiniCPM' plugins, because they use 'llama_cpp_install.py'. Deleting them will restore to normal. It has nothing to do with comfyui itself.
This is the problem with comfui itself. Don't look for the problem. It has happened before. Why can V0.3.56 work normally? Don't you think so!
I found the reason, it was caused by 'llama-cpp-python'. I just need to delete 'ComfyUI-JoyCaption' and 'ComfyUI-MiniCPM' plugins, because they use 'llama_cpp_install.py'. Deleting them will restore to normal. It has nothing to do with comfyui itself.
This is the problem with comfui itself. Don't look for the problem. It has happened before. Why can V0.3.56 work normally? Don't you think so!
If the method doesn't work for you, you can try reinstalling comfyui. I also rolled back to 3.56 at first, but later I found the specific reason, and now I have no problem using 3.60.
I found the reason, it was caused by 'llama-cpp-python'. I just need to delete 'ComfyUI-JoyCaption' and 'ComfyUI-MiniCPM' plugins, because they use 'llama_cpp_install.py'. Deleting them will restore to normal. It has nothing to do with comfyui itself.
This is the problem with comfui itself. Don't look for the problem. It has happened before. Why can V0.3.56 work normally? Don't you think so!
If the method doesn't work for you, you can try reinstalling comfyui. I also rolled back to 3.56 at first, but later I found the specific reason, and now I have no problem using 3.60.
what is the specific reason,can you tell me? even i reinstalling comfyui without any other node,if i update to new version higher than 3.56,it happend again
Check your ComfyUI startup parameters.
A friend(4090) had the same issue and asked me for help. After investigating, I found that the problem might be related to cuDNN autotune. If the --fast parameter is present in startup command line, it enables benchmarking functionality and causes SDXL allocate over 24 GB of VRAM when executing Ksampler.
I was able to reproduce the issue on my own old build. This problem isn't limited to 30/40-series GPUs. Even 20-series cards (such as my Titan RTX) experience it.
My normal startup parameters
py ComfyUI\main.py --windows-standalone-build --listen 0.0.0.0 --port 58188
pause
Logs
Checkpoint files will always be loaded safely.
Total VRAM 24576 MB, total RAM 130756 MB
pytorch version: 2.8.0+cu128
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA TITAN RTX : cudaMallocAsync
Using pytorch attention
Python version: 3.13.5 (tags/v3.13.5:6cb20a2, Jun 11 2025, 16:15:46) [MSC v.1943 64 bit (AMD64)]
ComfyUI version: 0.3.62
ComfyUI frontend version: 1.26.13
Reproduce the issue
py ComfyUI\main.py --fast --windows-standalone-build --listen 0.0.0.0 --port 58188
pause
Logs
Checkpoint files will always be loaded safely.
Total VRAM 24576 MB, total RAM 130756 MB
pytorch version: 2.8.0+cu128
Enabled fp16 accumulation.
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA TITAN RTX : cudaMallocAsync
Using pytorch attention
Python version: 3.13.5 (tags/v3.13.5:6cb20a2, Jun 11 2025, 16:15:46) [MSC v.1943 64 bit (AMD64)]
ComfyUI version: 0.3.62
ComfyUI frontend version: 1.26.13
Solution 1, remove --fast from startup parameters (Recommend)
Simply remove --fast will solve that issue, but you will lose all other optimization.
In fact, not much has changed.
EDIT1: With both --fast and --use-sage-attention (need compile lib with VS2022 env) will boost up about 10%~15%
py ComfyUI\main.py --fast --use-sage-attention --cuda-malloc --windows-standalone-build --listen 0.0.0.0 --port 58188
pause
EDIT2: With only --use-sage-attention lose about 3% performance
py ComfyUI\main.py --use-sage-attention --cuda-malloc --windows-standalone-build --listen 0.0.0.0 --port 58188
pause
Solution 2, modify your ComfyUI
WARNING: This method may prevent you upgrading ComfyUI via Git from official channel in future Upgrade to 0.3.62(latest)
Modify comfy/cli_args.py
https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/cli_args.py
Comment line 146.
class PerformanceFeature(enum.Enum):
Fp16Accumulation = "fp16_accumulation"
Fp8MatrixMultiplication = "fp8_matrix_mult"
CublasOps = "cublas_ops"
#AutoTune = "autotune" # Disable autotune
Modify comfy/ops.py
https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/ops.py
Comment line 55 and 56.
cast_to = comfy.model_management.cast_to #TODO: remove once no more references
#if torch.cuda.is_available() and torch.backends.cudnn.is_available() and PerformanceFeature.AutoTune in args.fast:
#torch.backends.cudnn.benchmark = True
def cast_to_input(weight, input, non_blocking=False, copy=True):
return comfy.model_management.cast_to(weight, input.dtype, input.device, non_blocking=non_blocking, copy=copy)
This is something I've been experiencing myself. Ever since v0.3.57 something has changed and it causes erratic spikes in VRAM usage that slow everything down.
Checkpoint files will always be loaded safely.
Total VRAM 24576 MB, total RAM 65299 MB
pytorch version: 2.8.0+cu129
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3090 : cudaMallocAsync
Using sage attention
Python version: 3.13.6 (tags/v3.13.6:4e66535, Aug 6 2025, 14:36:00) [MSC v.1944 64 bit (AMD64)]
ComfyUI version: 0.3.64
I've ran a test on a clean portable installation, no custom nodes, sage-attention installed, and running a basic SDXL t2i looks like this:
Constant jumps to maximum utilization that then drop off. Erratic VRAM allocation behavior that slows down the entire sampling process, and it's even worse in my usual workflow that also does ControlNet and ESRGAN upscaling, the flow chokes on every step that requires loading a model.
After following @mirabarukaso advice and removing the --fast flag, the problem was solved. 8GB of VRAM got quickly allocated by the model, sampling went on perfectly smooth. v0.3.56 was the last version that was unaffected by this issue, but at least now I know what was the culprit so I can finally update in peace.
Check your ComfyUI startup parameters.
A friend(4090) had the same issue and asked me for help. After investigating, I found that the problem might be related to cuDNN autotune. If the
--fastparameter is present in startup command line, it enables benchmarking functionality and causes SDXL allocate over 24 GB of VRAM when executing Ksampler. I was able to reproduce the issue on my own old build. This problem isn't limited to 30/40-series GPUs. Even 20-series cards (such as my Titan RTX) experience it.My normal startup parameters
py ComfyUI\main.py --windows-standalone-build --listen 0.0.0.0 --port 58188 pauseLogs
Checkpoint files will always be loaded safely. Total VRAM 24576 MB, total RAM 130756 MB pytorch version: 2.8.0+cu128 Set vram state to: NORMAL_VRAM Device: cuda:0 NVIDIA TITAN RTX : cudaMallocAsync Using pytorch attention Python version: 3.13.5 (tags/v3.13.5:6cb20a2, Jun 11 2025, 16:15:46) [MSC v.1943 64 bit (AMD64)] ComfyUI version: 0.3.62 ComfyUI frontend version: 1.26.13Reproduce the issue
py ComfyUI\main.py --fast --windows-standalone-build --listen 0.0.0.0 --port 58188 pauseLogs
Checkpoint files will always be loaded safely. Total VRAM 24576 MB, total RAM 130756 MB pytorch version: 2.8.0+cu128 Enabled fp16 accumulation. Set vram state to: NORMAL_VRAM Device: cuda:0 NVIDIA TITAN RTX : cudaMallocAsync Using pytorch attention Python version: 3.13.5 (tags/v3.13.5:6cb20a2, Jun 11 2025, 16:15:46) [MSC v.1943 64 bit (AMD64)] ComfyUI version: 0.3.62 ComfyUI frontend version: 1.26.13Solution 1, remove --fast from startup parameters (Recommend)
Simply remove
--fastwill solve that issue, but you will lose all other optimization. In fact, not much has changed.EDIT1: With both
--fastand--use-sage-attention(need compile lib with VS2022 env) will boost up about 10%~15%py ComfyUI\main.py --fast --use-sage-attention --cuda-malloc --windows-standalone-build --listen 0.0.0.0 --port 58188 pauseEDIT2: With only
--use-sage-attentionlose about 3% performancepy ComfyUI\main.py --use-sage-attention --cuda-malloc --windows-standalone-build --listen 0.0.0.0 --port 58188 pauseSolution 2, modify your ComfyUI
WARNING: This method may prevent you upgrading ComfyUI via Git from official channel in future Upgrade to 0.3.62(latest)
Modify
comfy/cli_args.pyhttps://github.com/comfyanonymous/ComfyUI/blob/master/comfy/cli_args.py Comment line 146.class PerformanceFeature(enum.Enum): Fp16Accumulation = "fp16_accumulation" Fp8MatrixMultiplication = "fp8_matrix_mult" CublasOps = "cublas_ops" #AutoTune = "autotune" # Disable autotuneModify
comfy/ops.pyhttps://github.com/comfyanonymous/ComfyUI/blob/master/comfy/ops.py Comment line 55 and 56.cast_to = comfy.model_management.cast_to #TODO: remove once no more references #if torch.cuda.is_available() and torch.backends.cudnn.is_available() and PerformanceFeature.AutoTune in args.fast: #torch.backends.cudnn.benchmark = True def cast_to_input(weight, input, non_blocking=False, copy=True): return comfy.model_management.cast_to(weight, input.dtype, input.device, non_blocking=non_blocking, copy=copy)
It works! @mirabarukaso Big Thanks!!!
Rolling back to 3.56 was the only thing that fixed this for me. Thanks thread!
Updated to v0.3.72 and re-added --fast fp16_accumulation to the launch parameters. The erratic VRAM allocation issue seems to be gone and the workflow runs as expected. Worth testing further to see if it's resolved.