After updating to version 0.3.60, KSample keeps getting stuck when running a simple workflow.
Custom Node Testing
- [ ] I have tried disabling custom nodes and the issue persists (see how to disable custom nodes if you need help)
Expected Behavior
After updating to version 0.3.60, KSample keeps getting stuck when running a simple workflow.
Actual Behavior
Steps to Reproduce
Debug Logs
Adding extra search path checkpoints /mnt/sd-models/models/Stable-diffusion
Adding extra search path configs /mnt/sd-models/models/Stable-diffusion
Adding extra search path vae /mnt/sd-models/models/VAE
Adding extra search path loras /mnt/sd-models/models/Lora
Adding extra search path loras /mnt/sd-models/models/LyCORIS
Adding extra search path upscale_models /mnt/sd-models/models/ESRGAN
Adding extra search path upscale_models /mnt/sd-models/models/RealESRGAN
Adding extra search path upscale_models /mnt/sd-models/models/SwinIR
Adding extra search path embeddings /mnt/sd-models/embeddings
Adding extra search path hypernetworks /mnt/sd-models/models/hypernetworks
Adding extra search path controlnet /mnt/sd-models/models/ControlNet
Set cuda device to: 3
Checkpoint files will always be loaded safely.
Total VRAM 24217 MB, total RAM 257580 MB
pytorch version: 2.8.0+cu128
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 4090 : cudaMallocAsync
Using pytorch attention
Python version: 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0]
ComfyUI version: 0.3.60
ComfyUI frontend version: 1.26.13
[Prompt Server] web root: /home/ai/anaconda3/envs/comfyui-new/lib/python3.10/site-packages/comfyui_frontend_package/static
Skipping loading of custom nodes
Context impl SQLiteImpl.
Will assume non-transactional DDL.
No target revision found.
Starting server
To see the GUI go to: http://0.0.0.0:8088
got prompt
model weight dtype torch.float8_e4m3fn, manual cast: torch.bfloat16
model_type FLUX
Using pytorch attention in VAE
Using pytorch attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load FluxClipModel_
loaded completely 22207.3 4777.53759765625 True
Requested to load Flux
loaded completely 17133.029314605712 11350.067443847656 True
0%| | 0/20 [00:00<?, ?it/s]
Other
No response
Does it work if you launch with python3 main.py --cache-none
Does it work if you launch with
python3 main.py --cache-none
Thank you for your reply, I added the --cache-none parameter, but it didn't work.
Does it work if you launch with
python3 main.py --cache-nonethe console logs: (comfyui-new) ai@debian:/mnt/ai_proj/ComfyUI$ python main.py --listen 0.0.0.0 --port 8088 --multi-user --cuda-device 3 --disable-all-custom-nodes --cache-none Adding extra search path checkpoints /mnt/sd-models/models/Stable-diffusion Adding extra search path configs /mnt/sd-models/models/Stable-diffusion Adding extra search path vae /mnt/sd-models/models/VAE Adding extra search path loras /mnt/sd-models/models/Lora Adding extra search path loras /mnt/sd-models/models/LyCORIS Adding extra search path upscale_models /mnt/sd-models/models/ESRGAN Adding extra search path upscale_models /mnt/sd-models/models/RealESRGAN Adding extra search path upscale_models /mnt/sd-models/models/SwinIR Adding extra search path embeddings /mnt/sd-models/embeddings Adding extra search path hypernetworks /mnt/sd-models/models/hypernetworks Adding extra search path controlnet /mnt/sd-models/models/ControlNet Set cuda device to: 3 Checkpoint files will always be loaded safely. Total VRAM 24217 MB, total RAM 257580 MB pytorch version: 2.8.0+cu128 Set vram state to: NORMAL_VRAM Device: cuda:0 NVIDIA GeForce RTX 4090 : cudaMallocAsync Using pytorch attention Python version: 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0] ComfyUI version: 0.3.64 ComfyUI frontend version: 1.27.10 [Prompt Server] web root: /home/ai/anaconda3/envs/comfyui-new/lib/python3.10/site-packages/comfyui_frontend_package/static Skipping loading of custom nodes Context impl SQLiteImpl. Will assume non-transactional DDL. No target revision found. Disabling intermediate node cache. Starting server
To see the GUI go to: http://0.0.0.0:8088 got prompt model weight dtype torch.float16, manual cast: None model_type EPS Using pytorch attention in VAE Using pytorch attention in VAE VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16 CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16 Requested to load SD1ClipModel loaded completely 22207.3 235.84423828125 True Requested to load BaseModel loaded completely 21871.33067779541 1639.406135559082 True 0%| | 0/20 [00:00<?, ?it/s]
Does a super light model like SD1.5 work? Also, when the workflow gets stuck, could you run nvidia-smi in a terminal and post the results here?
Does a super light model like SD1.5 work? Also, when the workflow gets stuck, could you run
nvidia-smiin a terminal and post the results here?
Does a super light model like SD1.5 work? Also, when the workflow gets stuck, could you run
nvidia-smiin a terminal and post the results here?
When running the sd1.5 lightweight model, KSample will also stuck.
ComfyUI Version: v0.3.64 Pytorch Version: 2.7.0+cu128 Python Version: 3.12.9 (main, Feb 12 2025, 14:52:31) [MSC v.1942 64 bit (AMD64)]
I can confirm that recently, probably since an update of ComfyUI (v0.3.64), KSampler has become very slow and “Sampler Custom Advanced” gets stuck.
I therefore updated my Nvidia drivers from 572.83 to 581.42, and now there has been a noticeable improvement: KSampler has become faster and “Sampler Custom Advanced” no longer freezes, but it is still slower than I remember it being.
Update: ComfyUI has definitely become more resource-intensive. Other programs running in parallel are more prone to stuttering. This was not the case before.
ComfyUI Version: v0.3.63 pytorch version: 2.8.0+cu129 Python version: 3.11.13 | packaged by conda-forge | (main, Jun 4 2025, 14:48:23) [GCC 13.3.0]
^ this is what i'm using rn I'm probably having the same issue - might add that in my case this only happens when the model is run from a docker container (doesn't seem to be pytorch or cuda-related - I've tried using either mine or prebuilt images with cu128-130 - same behaviour. Also updated my drivers on the host system - this kind of made it working properly for 3-4 runs, but then this started to happen again) As soon as the workflow gets to the KSampler node like 8G of VRAM gets filled instantly and then it slowly loads more data into VRAM and never really stops, eventually starting to use shared memory and making the host sytem unusable. It also doesn't matter which model is used - i'm getting the same exact behaviour with sd1.5 model from txt2img example and with SDXL models. Interestingly, this doesn't seem to happen on windows standalone build
I have the same issue, did anyone find a fix? I've been trying to fix this for several days now. Imma revert back to the prior ComfyUI version. @muyifeiyang -- does flux-fp8 checkpoint have a VAE inside the file? I don't think it does, may have to use a VAE Loader (node), and input the ae.safetensors file (VAE file) I don't use FP8, so i am not 100% for sure. May have to use the DualClipLoader too, and download the T5... and clip_l -- put them in their appropriate folders.