ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

After updating to V0.3.59 of confui, some nodes such as VAE encoding and decoding, as well as image zooming, have become extremely slow and laggy.

Open kajfblsdkgnlsndg opened this issue 4 months ago • 20 comments

Custom Node Testing

Your question

After the update to version V0.3.59, the nodes for VAE encoding and decoding, as well as image enlargement, consume significantly high amounts of video memory and operate very slowly, whereas they functioned normally in versions V0.3.57 and below. RTX4080 with 64GB of memory.

Image V0.3.59 Image V0.3.57

Thank you very much. I hope to get some help

Logs


Other

No response

kajfblsdkgnlsndg avatar Sep 11 '25 15:09 kajfblsdkgnlsndg

I'm struggling with this as well. All of my current workflows are broken. I have a 4090 and can no longer process any sdxl images. It's spiking immediately up to 24gig of vram and failing with VAE trying to switch to tiled mode. I installed of 2nd copy of comfyui with the Comfyui Easy Installer for pixorama and had the same problem. What I found is removing the rgthree node makes things functional again, just none of my workflows. Default simple comfyui workflows work properly.

kidkool28 avatar Sep 11 '25 18:09 kidkool28

I'm struggling with this as well. All of my current workflows are broken. I have a 4090 and can no longer process any sdxl images. It's spiking immediately up to 24gig of vram and failing with VAE trying to switch to tiled mode. I installed of 2nd copy of comfyui with the Comfyui Easy Installer for pixorama and had the same problem. What I found is removing the rgthree node makes things functional again, just none of my workflows. Default simple comfyui workflows work properly.

Yes, especially VAE encoding and decoding, which also consume a lot of video memory and are 10 times slower than before

kajfblsdkgnlsndg avatar Sep 13 '25 01:09 kajfblsdkgnlsndg

Same issue, so I added nodes like 'clean vram used, set reserved vram ,delay' to make the workflow run normally.

cdmusic2019 avatar Sep 17 '25 14:09 cdmusic2019

I found the reason, it was caused by 'llama-cpp-python'. I just need to delete 'ComfyUI-JoyCaption' and 'ComfyUI-MiniCPM' plugins, because they use 'llama_cpp_install.py'. Deleting them will restore to normal. It has nothing to do with comfyui itself.

cdmusic2019 avatar Sep 22 '25 03:09 cdmusic2019

I found the reason, it was caused by 'llama-cpp-python'. I just need to delete 'ComfyUI-JoyCaption' and 'ComfyUI-MiniCPM' plugins, because they use 'llama_cpp_install.py'. Deleting them will restore to normal. It has nothing to do with comfyui itself.

Deleting these two plugins still didn't work. They still consume a lot of VRAM and are very slow. I have reverted back to version 0.3.56

Image

kajfblsdkgnlsndg avatar Sep 22 '25 11:09 kajfblsdkgnlsndg

By the way, I have 16G VRAM, and I have successfully tested it on pytorch version: 2.7.1+cu126 and pytorch version: 2.6+cu126.

cdmusic2019 avatar Sep 22 '25 11:09 cdmusic2019

By the way, I have 16G VRAM, and I have successfully tested it on pytorch version: 2.7.1+cu126 and pytorch version: 2.6+cu126.

I have an RTX4080 with 16GB + 64GB of memory Python version: 3.12.3 pytorch version: 2.7.0+cu128

kajfblsdkgnlsndg avatar Sep 22 '25 13:09 kajfblsdkgnlsndg

Just like you. Back in the past, processing images of this size wouldn't take more than a second on 5090. Now I can only roll back to the previous version of Comfy to fix the problem. Image

Image

Inmanguo avatar Sep 22 '25 14:09 Inmanguo

Can you try after disabling custom nodes? This will help identify whether the issue should be addressed here or somewhere else.

christian-byrne avatar Sep 22 '25 16:09 christian-byrne

ImageI'm experiencing the same issue. After upgrading from v0.3.56 to v0.3.57, the VAE decode process has become extremely slow and the CUDA utilization is very low

darklight1992 avatar Sep 24 '25 05:09 darklight1992

I found the reason, it was caused by 'llama-cpp-python'. I just need to delete 'ComfyUI-JoyCaption' and 'ComfyUI-MiniCPM' plugins, because they use 'llama_cpp_install.py'. Deleting them will restore to normal. It has nothing to do with comfyui itself.

I installed JoyCaption yesterday and now run into that issue, after disabling JoyCaption again it works as before. Other than VAE, which let the VRAM usage explode it also heavily impacts Ultralytics BBOX models. Very strange behavior. If anyone has an explanation it would appreciated.

Njaecha avatar Sep 25 '25 09:09 Njaecha

I found the reason, it was caused by 'llama-cpp-python'. I just need to delete 'ComfyUI-JoyCaption' and 'ComfyUI-MiniCPM' plugins, because they use 'llama_cpp_install.py'. Deleting them will restore to normal. It has nothing to do with comfyui itself.

I installed JoyCaption yesterday and now run into that issue, after disabling JoyCaption again it works as before. Other than VAE, which let the VRAM usage explode it also heavily impacts Ultralytics BBOX models. Very strange behavior. If anyone has an explanation it would appreciated.

You can install other versions without using 'llama_cpp_install.py'.

cdmusic2019 avatar Sep 25 '25 12:09 cdmusic2019

I found the reason, it was caused by 'llama-cpp-python'. I just need to delete 'ComfyUI-JoyCaption' and 'ComfyUI-MiniCPM' plugins, because they use 'llama_cpp_install.py'. Deleting them will restore to normal. It has nothing to do with comfyui itself.

I installed JoyCaption yesterday and now run into that issue, after disabling JoyCaption again it works as before. Other than VAE, which let the VRAM usage explode it also heavily impacts Ultralytics BBOX models. Very strange behavior. If anyone has an explanation it would appreciated.

You can install other versions without using 'llama_cpp_install.py'.

This is the problem with comfui itself. Don't look for the problem. It has happened before. Why can V0.3.56 work normally? Don't you think so!

kajfblsdkgnlsndg avatar Sep 26 '25 16:09 kajfblsdkgnlsndg

I found the reason, it was caused by 'llama-cpp-python'. I just need to delete 'ComfyUI-JoyCaption' and 'ComfyUI-MiniCPM' plugins, because they use 'llama_cpp_install.py'. Deleting them will restore to normal. It has nothing to do with comfyui itself.

This is the problem with comfui itself. Don't look for the problem. It has happened before. Why can V0.3.56 work normally? Don't you think so!

kajfblsdkgnlsndg avatar Sep 26 '25 16:09 kajfblsdkgnlsndg

I found the reason, it was caused by 'llama-cpp-python'. I just need to delete 'ComfyUI-JoyCaption' and 'ComfyUI-MiniCPM' plugins, because they use 'llama_cpp_install.py'. Deleting them will restore to normal. It has nothing to do with comfyui itself.

This is the problem with comfui itself. Don't look for the problem. It has happened before. Why can V0.3.56 work normally? Don't you think so!

If the method doesn't work for you, you can try reinstalling comfyui. I also rolled back to 3.56 at first, but later I found the specific reason, and now I have no problem using 3.60.

cdmusic2019 avatar Sep 27 '25 00:09 cdmusic2019

I found the reason, it was caused by 'llama-cpp-python'. I just need to delete 'ComfyUI-JoyCaption' and 'ComfyUI-MiniCPM' plugins, because they use 'llama_cpp_install.py'. Deleting them will restore to normal. It has nothing to do with comfyui itself.

This is the problem with comfui itself. Don't look for the problem. It has happened before. Why can V0.3.56 work normally? Don't you think so!

If the method doesn't work for you, you can try reinstalling comfyui. I also rolled back to 3.56 at first, but later I found the specific reason, and now I have no problem using 3.60.

what is the specific reason,can you tell me? even i reinstalling comfyui without any other node,if i update to new version higher than 3.56,it happend again

darklight1992 avatar Sep 27 '25 06:09 darklight1992

Check your ComfyUI startup parameters.

A friend(4090) had the same issue and asked me for help. After investigating, I found that the problem might be related to cuDNN autotune. If the --fast parameter is present in startup command line, it enables benchmarking functionality and causes SDXL allocate over 24 GB of VRAM when executing Ksampler. I was able to reproduce the issue on my own old build. This problem isn't limited to 30/40-series GPUs. Even 20-series cards (such as my Titan RTX) experience it.

My normal startup parameters

py ComfyUI\main.py --windows-standalone-build --listen 0.0.0.0 --port 58188
pause

Logs

Checkpoint files will always be loaded safely.
Total VRAM 24576 MB, total RAM 130756 MB
pytorch version: 2.8.0+cu128
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA TITAN RTX : cudaMallocAsync
Using pytorch attention
Python version: 3.13.5 (tags/v3.13.5:6cb20a2, Jun 11 2025, 16:15:46) [MSC v.1943 64 bit (AMD64)]
ComfyUI version: 0.3.62
ComfyUI frontend version: 1.26.13

Reproduce the issue

py ComfyUI\main.py --fast --windows-standalone-build --listen 0.0.0.0 --port 58188
pause

Logs

Checkpoint files will always be loaded safely.
Total VRAM 24576 MB, total RAM 130756 MB
pytorch version: 2.8.0+cu128
Enabled fp16 accumulation.
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA TITAN RTX : cudaMallocAsync
Using pytorch attention
Python version: 3.13.5 (tags/v3.13.5:6cb20a2, Jun 11 2025, 16:15:46) [MSC v.1943 64 bit (AMD64)]
ComfyUI version: 0.3.62
ComfyUI frontend version: 1.26.13

Solution 1, remove --fast from startup parameters (Recommend)

Simply remove --fast will solve that issue, but you will lose all other optimization. In fact, not much has changed.

EDIT1: With both --fast and --use-sage-attention (need compile lib with VS2022 env) will boost up about 10%~15%

py ComfyUI\main.py --fast --use-sage-attention --cuda-malloc --windows-standalone-build --listen 0.0.0.0 --port 58188
pause

EDIT2: With only --use-sage-attention lose about 3% performance

py ComfyUI\main.py --use-sage-attention --cuda-malloc --windows-standalone-build --listen 0.0.0.0 --port 58188
pause

Solution 2, modify your ComfyUI

WARNING: This method may prevent you upgrading ComfyUI via Git from official channel in future Upgrade to 0.3.62(latest)

Modify comfy/cli_args.py https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/cli_args.py Comment line 146.

class PerformanceFeature(enum.Enum):
    Fp16Accumulation = "fp16_accumulation"
    Fp8MatrixMultiplication = "fp8_matrix_mult"
    CublasOps = "cublas_ops"
    #AutoTune = "autotune"  # Disable autotune

Modify comfy/ops.py https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/ops.py Comment line 55 and 56.

cast_to = comfy.model_management.cast_to #TODO: remove once no more references

#if torch.cuda.is_available() and torch.backends.cudnn.is_available() and PerformanceFeature.AutoTune in args.fast:
    #torch.backends.cudnn.benchmark = True

def cast_to_input(weight, input, non_blocking=False, copy=True):
    return comfy.model_management.cast_to(weight, input.dtype, input.device, non_blocking=non_blocking, copy=copy)

mirabarukaso avatar Oct 06 '25 17:10 mirabarukaso

This is something I've been experiencing myself. Ever since v0.3.57 something has changed and it causes erratic spikes in VRAM usage that slow everything down.

Checkpoint files will always be loaded safely.
Total VRAM 24576 MB, total RAM 65299 MB
pytorch version: 2.8.0+cu129
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3090 : cudaMallocAsync
Using sage attention
Python version: 3.13.6 (tags/v3.13.6:4e66535, Aug  6 2025, 14:36:00) [MSC v.1944 64 bit (AMD64)]
ComfyUI version: 0.3.64

I've ran a test on a clean portable installation, no custom nodes, sage-attention installed, and running a basic SDXL t2i looks like this:

Image

Constant jumps to maximum utilization that then drop off. Erratic VRAM allocation behavior that slows down the entire sampling process, and it's even worse in my usual workflow that also does ControlNet and ESRGAN upscaling, the flow chokes on every step that requires loading a model.

After following @mirabarukaso advice and removing the --fast flag, the problem was solved. 8GB of VRAM got quickly allocated by the model, sampling went on perfectly smooth. v0.3.56 was the last version that was unaffected by this issue, but at least now I know what was the culprit so I can finally update in peace.

supra107 avatar Oct 09 '25 11:10 supra107

Check your ComfyUI startup parameters.

A friend(4090) had the same issue and asked me for help. After investigating, I found that the problem might be related to cuDNN autotune. If the --fast parameter is present in startup command line, it enables benchmarking functionality and causes SDXL allocate over 24 GB of VRAM when executing Ksampler. I was able to reproduce the issue on my own old build. This problem isn't limited to 30/40-series GPUs. Even 20-series cards (such as my Titan RTX) experience it.

My normal startup parameters

py ComfyUI\main.py --windows-standalone-build --listen 0.0.0.0 --port 58188
pause

Logs

Checkpoint files will always be loaded safely.
Total VRAM 24576 MB, total RAM 130756 MB
pytorch version: 2.8.0+cu128
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA TITAN RTX : cudaMallocAsync
Using pytorch attention
Python version: 3.13.5 (tags/v3.13.5:6cb20a2, Jun 11 2025, 16:15:46) [MSC v.1943 64 bit (AMD64)]
ComfyUI version: 0.3.62
ComfyUI frontend version: 1.26.13

Reproduce the issue

py ComfyUI\main.py --fast --windows-standalone-build --listen 0.0.0.0 --port 58188
pause

Logs

Checkpoint files will always be loaded safely.
Total VRAM 24576 MB, total RAM 130756 MB
pytorch version: 2.8.0+cu128
Enabled fp16 accumulation.
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA TITAN RTX : cudaMallocAsync
Using pytorch attention
Python version: 3.13.5 (tags/v3.13.5:6cb20a2, Jun 11 2025, 16:15:46) [MSC v.1943 64 bit (AMD64)]
ComfyUI version: 0.3.62
ComfyUI frontend version: 1.26.13

Solution 1, remove --fast from startup parameters (Recommend)

Simply remove --fast will solve that issue, but you will lose all other optimization. In fact, not much has changed.

EDIT1: With both --fast and --use-sage-attention (need compile lib with VS2022 env) will boost up about 10%~15%

py ComfyUI\main.py --fast --use-sage-attention --cuda-malloc --windows-standalone-build --listen 0.0.0.0 --port 58188
pause

EDIT2: With only --use-sage-attention lose about 3% performance

py ComfyUI\main.py --use-sage-attention --cuda-malloc --windows-standalone-build --listen 0.0.0.0 --port 58188
pause

Solution 2, modify your ComfyUI

WARNING: This method may prevent you upgrading ComfyUI via Git from official channel in future Upgrade to 0.3.62(latest)

Modify comfy/cli_args.py https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/cli_args.py Comment line 146.

class PerformanceFeature(enum.Enum):
    Fp16Accumulation = "fp16_accumulation"
    Fp8MatrixMultiplication = "fp8_matrix_mult"
    CublasOps = "cublas_ops"
    #AutoTune = "autotune"  # Disable autotune

Modify comfy/ops.py https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/ops.py Comment line 55 and 56.

cast_to = comfy.model_management.cast_to #TODO: remove once no more references

#if torch.cuda.is_available() and torch.backends.cudnn.is_available() and PerformanceFeature.AutoTune in args.fast:
    #torch.backends.cudnn.benchmark = True

def cast_to_input(weight, input, non_blocking=False, copy=True):
    return comfy.model_management.cast_to(weight, input.dtype, input.device, non_blocking=non_blocking, copy=copy)

It works! @mirabarukaso Big Thanks!!!

Inmanguo avatar Oct 13 '25 04:10 Inmanguo

Rolling back to 3.56 was the only thing that fixed this for me. Thanks thread!

0ucb avatar Nov 14 '25 03:11 0ucb

Updated to v0.3.72 and re-added --fast fp16_accumulation to the launch parameters. The erratic VRAM allocation issue seems to be gone and the workflow runs as expected. Worth testing further to see if it's resolved.

supra107 avatar Nov 25 '25 22:11 supra107