ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

RAM Memory Leak

Open SilverJim opened this issue 1 month ago • 6 comments

Custom Node Testing

Expected Behavior

Image Batch free the commited RAM memory after use. So user can generate a lot of long video without restart the comfyui.

Image Image

Actual Behavior

Image Batch do not free the commited RAM memory even if the workflow is finished. So if too many long video are generated the memory will be used up and process will be eventually killed by the OS because of the virtual memory used up if I keep runing workflow multiple times without restart Comfyui.

Whenerver run the workflow(input should be changed to avoid get result from cache without running the node) the commited size will be increased (in my case about increase by 10GB) and never reduce. Image

Steps to Reproduce

memory_leak_bug.json

Image

Debug Logs

got prompt
Prompt executed in 2.24 seconds

Other

RAM 64GB VRAM 24GB OS: Windows 11

Image

SilverJim avatar Dec 13 '25 02:12 SilverJim

I've noticed that Comfy caches things now. Here's an example:

  1. Start with a basic workflow to make an image
  2. Set the seed to fixed
  3. Generate an image
  4. Increase the seed by one
  5. Generate an image
  6. Decrease the seed by one to get back to your first generation's seed
  7. If you try to generate an image it will instantly show you the previously cached image without having to regenerate it from scratch

So I assume this is essentially what it's doing in your case. It will fill up the cache until it runs out of room, then probably start culling out the oldest entries. If you performed a similar test to what I just described, the earliest of that 700 would likely have to actually be recalculated again. You might be keeping 400 in ram or something at any given time. I don't think this is actually a memory leak, it's just efficient caching. You can use the --cache-ram SOME.VALUE to give it a fixed max size. So if you have 64gb of ram, you could set it to --cache-ram 32.0 and then if your system memory hits 32gb, it will start purging the cache. If you aren't using --cache-ram, then it will blindly keep filling up your ram+pagefile. Realistically, it needs to automatically start clearing old stuff out based on your system memory and not total ram+pagefile memory.

RandomGitUser321 avatar Dec 13 '25 13:12 RandomGitUser321

After I use the --cache-ram 5.0, when I run the workflow above, if I do not change the input of the node sometimes the node get the cached result, sometimes the cache are released and the nodes run again, but whenever the nodes run, the commited memory is still increase and not decrease after the workflow is finished. And if I run several times without restart ComfyUI the memory will still eventually used up and following runtime error is raised. @RandomGitUser321

Logs: got prompt !!! Exception during processing !!! [enforce fail at alloc_cpu.cpp:121] data. DefaultCPUAllocator: not enough memory: you tried to allocate 11612160000 bytes. Traceback (most recent call last): File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 515, in execute output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 329, in get_output_data return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 303, in _async_map_node_over_list await process_inputs(input_dict, i) File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 291, in process_inputs result = f(**inputs) ^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1898, in generate return (torch.cat((r, g, b), dim=-1), ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: [enforce fail at alloc_cpu.cpp:121] data. DefaultCPUAllocator: not enough memory: you tried to allocate 11612160000 bytes.

Prompt executed in 10.43 seconds

Image Image Image

SilverJim avatar Dec 13 '25 21:12 SilverJim

I use --cache-none to disable the cache the problem still exist. @RandomGitUser321

Image Image

SilverJim avatar Dec 14 '25 02:12 SilverJim

Yeah I'm seeing this issue on my end now. I recreated your workflow, ran it, then incremented the color, ran it, rinse and repeat, it finally ran out of memory.

If you're using the ComfyUI manager, you can click the "Free model and node cache" button (there are two buttons, it's the right one, not the "Unload models" button) and it does correctly free everything back up again. So it appears that things are being correctly tracked, they just aren't prompting the ComfyUI memory management system to clean things up. So it's not quite a memory leak, but it's kind of close to one.

RandomGitUser321 avatar Dec 14 '25 06:12 RandomGitUser321

@RandomGitUser321

It do not work for me when I click the "free model and node cache" button in ComfyUI-Manager, there is a "Disabling intermediate node cache" log in the command line, and the active memory is reduced in about 1min after I click the button, but the commit size do not reduce and it is still very high, ComfyUI or workflow can be fail just because of the commit size used up.

How do you reduce the commit size of the ComfyUI? Are you test it in Windows environment?

The following image is the status after I run workflow several times, click the "free model and node cache" button and wait until active memory reduce.

In following images 93GB in 93GB/112GB is the total commit size of all the process in my OS, it was about 40GB before I run the workflow, and the 48,677,548K which is about 48GB is the commit size of the ComfyUI process, and it was less than 10GB before I run the workflow.

Image Image Image Image

SilverJim avatar Dec 14 '25 08:12 SilverJim

I tried the workflow above using the ComfyUI in docker in WSL2, the same problem is not occure, so I think it is very possible that: 1 The memory leak only occured on Windows 2 Or there is no memory leak, and the "Free model and node cache" is useless on WIndows. @RandomGitUser321

SilverJim avatar Dec 15 '25 05:12 SilverJim

I find the issue is caused by the python environment as I update it from very early version, after I install a brand new ComfyUI, there is no long the issue.

SilverJim avatar Dec 16 '25 07:12 SilverJim

I find the issue is caused by the python environment as I update it from very early version, after I install a brand new ComfyUI, there is no long the issue.

What version were you on and what version do you go to? My tests were on 3.13. Maybe it's a torch thing that caused it? I checked the code on emptyimage and it uses torch for it.

RandomGitUser321 avatar Dec 16 '25 15:12 RandomGitUser321

I installed the ComfyUI in Jan, 2024 and I updated ComfyUI for several times, so I can not remember the update details, the python version right now is Python3.12.9

Image

SilverJim avatar Dec 18 '25 13:12 SilverJim