ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

After updating to version 0.3.60, KSample keeps getting stuck when running a simple workflow.

Open muyifeiyang opened this issue 4 months ago • 9 comments

Custom Node Testing

Expected Behavior

After updating to version 0.3.60, KSample keeps getting stuck when running a simple workflow.

Image

Actual Behavior

Image

Steps to Reproduce

Image

Debug Logs

Adding extra search path checkpoints /mnt/sd-models/models/Stable-diffusion
Adding extra search path configs /mnt/sd-models/models/Stable-diffusion
Adding extra search path vae /mnt/sd-models/models/VAE
Adding extra search path loras /mnt/sd-models/models/Lora
Adding extra search path loras /mnt/sd-models/models/LyCORIS
Adding extra search path upscale_models /mnt/sd-models/models/ESRGAN
Adding extra search path upscale_models /mnt/sd-models/models/RealESRGAN
Adding extra search path upscale_models /mnt/sd-models/models/SwinIR
Adding extra search path embeddings /mnt/sd-models/embeddings
Adding extra search path hypernetworks /mnt/sd-models/models/hypernetworks
Adding extra search path controlnet /mnt/sd-models/models/ControlNet
Set cuda device to: 3
Checkpoint files will always be loaded safely.
Total VRAM 24217 MB, total RAM 257580 MB
pytorch version: 2.8.0+cu128
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 4090 : cudaMallocAsync
Using pytorch attention
Python version: 3.10.14 (main, May  6 2024, 19:42:50) [GCC 11.2.0]
ComfyUI version: 0.3.60
ComfyUI frontend version: 1.26.13
[Prompt Server] web root: /home/ai/anaconda3/envs/comfyui-new/lib/python3.10/site-packages/comfyui_frontend_package/static
Skipping loading of custom nodes
Context impl SQLiteImpl.
Will assume non-transactional DDL.
No target revision found.
Starting server

To see the GUI go to: http://0.0.0.0:8088
got prompt
model weight dtype torch.float8_e4m3fn, manual cast: torch.bfloat16
model_type FLUX
Using pytorch attention in VAE
Using pytorch attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load FluxClipModel_
loaded completely 22207.3 4777.53759765625 True
Requested to load Flux
loaded completely 17133.029314605712 11350.067443847656 True
  0%|                                                                                                                                                    | 0/20 [00:00<?, ?it/s]

Other

No response

muyifeiyang avatar Oct 11 '25 01:10 muyifeiyang

Does it work if you launch with python3 main.py --cache-none

christian-byrne avatar Oct 11 '25 02:10 christian-byrne

Does it work if you launch with python3 main.py --cache-none

Thank you for your reply, I added the --cache-none parameter, but it didn't work.

muyifeiyang avatar Oct 11 '25 03:10 muyifeiyang

Does it work if you launch with python3 main.py --cache-none the console logs: (comfyui-new) ai@debian:/mnt/ai_proj/ComfyUI$ python main.py --listen 0.0.0.0 --port 8088 --multi-user --cuda-device 3 --disable-all-custom-nodes --cache-none Adding extra search path checkpoints /mnt/sd-models/models/Stable-diffusion Adding extra search path configs /mnt/sd-models/models/Stable-diffusion Adding extra search path vae /mnt/sd-models/models/VAE Adding extra search path loras /mnt/sd-models/models/Lora Adding extra search path loras /mnt/sd-models/models/LyCORIS Adding extra search path upscale_models /mnt/sd-models/models/ESRGAN Adding extra search path upscale_models /mnt/sd-models/models/RealESRGAN Adding extra search path upscale_models /mnt/sd-models/models/SwinIR Adding extra search path embeddings /mnt/sd-models/embeddings Adding extra search path hypernetworks /mnt/sd-models/models/hypernetworks Adding extra search path controlnet /mnt/sd-models/models/ControlNet Set cuda device to: 3 Checkpoint files will always be loaded safely. Total VRAM 24217 MB, total RAM 257580 MB pytorch version: 2.8.0+cu128 Set vram state to: NORMAL_VRAM Device: cuda:0 NVIDIA GeForce RTX 4090 : cudaMallocAsync Using pytorch attention Python version: 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0] ComfyUI version: 0.3.64 ComfyUI frontend version: 1.27.10 [Prompt Server] web root: /home/ai/anaconda3/envs/comfyui-new/lib/python3.10/site-packages/comfyui_frontend_package/static Skipping loading of custom nodes Context impl SQLiteImpl. Will assume non-transactional DDL. No target revision found. Disabling intermediate node cache. Starting server

To see the GUI go to: http://0.0.0.0:8088 got prompt model weight dtype torch.float16, manual cast: None model_type EPS Using pytorch attention in VAE Using pytorch attention in VAE VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16 CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16 Requested to load SD1ClipModel loaded completely 22207.3 235.84423828125 True Requested to load BaseModel loaded completely 21871.33067779541 1639.406135559082 True 0%| | 0/20 [00:00<?, ?it/s]

muyifeiyang avatar Oct 11 '25 03:10 muyifeiyang

Does a super light model like SD1.5 work? Also, when the workflow gets stuck, could you run nvidia-smi in a terminal and post the results here?

Kosinkadink avatar Oct 11 '25 04:10 Kosinkadink

Does a super light model like SD1.5 work? Also, when the workflow gets stuck, could you run nvidia-smi in a terminal and post the results here?

Image Image

muyifeiyang avatar Oct 11 '25 08:10 muyifeiyang

Does a super light model like SD1.5 work? Also, when the workflow gets stuck, could you run nvidia-smi in a terminal and post the results here?

When running the sd1.5 lightweight model, KSample will also stuck.

muyifeiyang avatar Oct 11 '25 08:10 muyifeiyang

ComfyUI Version: v0.3.64 Pytorch Version: 2.7.0+cu128 Python Version: 3.12.9 (main, Feb 12 2025, 14:52:31) [MSC v.1942 64 bit (AMD64)]

I can confirm that recently, probably since an update of ComfyUI (v0.3.64), KSampler has become very slow and “Sampler Custom Advanced” gets stuck.

I therefore updated my Nvidia drivers from 572.83 to 581.42, and now there has been a noticeable improvement: KSampler has become faster and “Sampler Custom Advanced” no longer freezes, but it is still slower than I remember it being.

Update: ComfyUI has definitely become more resource-intensive. Other programs running in parallel are more prone to stuttering. This was not the case before.

ZurrealZ avatar Oct 12 '25 00:10 ZurrealZ

ComfyUI Version: v0.3.63 pytorch version: 2.8.0+cu129 Python version: 3.11.13 | packaged by conda-forge | (main, Jun 4 2025, 14:48:23) [GCC 13.3.0]

^ this is what i'm using rn I'm probably having the same issue - might add that in my case this only happens when the model is run from a docker container (doesn't seem to be pytorch or cuda-related - I've tried using either mine or prebuilt images with cu128-130 - same behaviour. Also updated my drivers on the host system - this kind of made it working properly for 3-4 runs, but then this started to happen again) As soon as the workflow gets to the KSampler node like 8G of VRAM gets filled instantly and then it slowly loads more data into VRAM and never really stops, eventually starting to use shared memory and making the host sytem unusable. It also doesn't matter which model is used - i'm getting the same exact behaviour with sd1.5 model from txt2img example and with SDXL models. Interestingly, this doesn't seem to happen on windows standalone build

Muniwedesu avatar Oct 12 '25 10:10 Muniwedesu

I have the same issue, did anyone find a fix? I've been trying to fix this for several days now. Imma revert back to the prior ComfyUI version. @muyifeiyang -- does flux-fp8 checkpoint have a VAE inside the file? I don't think it does, may have to use a VAE Loader (node), and input the ae.safetensors file (VAE file) I don't use FP8, so i am not 100% for sure. May have to use the DualClipLoader too, and download the T5... and clip_l -- put them in their appropriate folders.

leftenantbeige avatar Nov 11 '25 10:11 leftenantbeige