ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

Black images in XL and Nunchuka(Kontext Dev) caused by Ksampler outputting NaNs. Nothing works.

Open Aenon1 opened this issue 6 months ago • 13 comments

Custom Node Testing

Expected Behavior

ComfyUI makes an image.

Actual Behavior

That's all I've tried with this install. It seems pointless to try anything else.

Error: ComfyUI\nodes.py:1585: RuntimeWarning: invalid value encountered in cast img = Image.fromarray(np.clip(i, 0, 255).astype(np.uint8))

I have tried everything I could find about this and nothing works. This is a new install(because it was doing it on the old one) on a fresh install of Windows 11. It is not a custom nodes fault, I don't have any installed. The XL model is the same one that I have been using for almost a year with no problems until recently. It does it with all XL models. Comfy and the front end are up to date(installed and updated today). It is the stand-alone, not portable version. It will sometimes make 3 or 4 images just fine, then it puts some artifacts that look like icons in an image or huge spots of just noise, and then it starts with the black images.

Please tell me how to fix this. It is making ComfyUI completely unusable. My computer is an Alienware X17 r2 with 32gb system ram, rtx 3080ti(16 gb vram). 2 NVME drives(512gb os, 2tb secondary-where Comfy is).

Steps to Reproduce

Open a basic workflow and run it.

Debug Logs

Error: ComfyUI\nodes.py:1585: RuntimeWarning: invalid value encountered in cast
  img = Image.fromarray(np.clip(i, 0, 255).astype(np.uint8))

Other

No response

Aenon1 avatar Jul 25 '25 14:07 Aenon1

I have the same issue and a common the fix for black images with --force-upcast-attention (like mentioned here: https://github.com/comfyanonymous/ComfyUI/issues/4241) does not work.

Njaecha avatar Jul 26 '25 00:07 Njaecha

Further information: The error that is shown in cmd is: ComfyUI\nodes.py:1585: RuntimeWarning: invalid value encountered in cast img = Image.fromarray(np.clip(i, 0, 255).astype(np.uint8))

Using --force-upcast-attention does not work. Neither does anything else I could find that is related to this problem.

I used an XL model but it also was doing it with Flux Context. I switched to XL so I could test this with a much simpler workflow and find out where the nan's were coming from.

I hooked show anything nodes to everything that it would let me. The ksampler output this: {'samples': tensor([[[[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]],

     [[nan, nan, nan,  ..., nan, nan, nan],
      [nan, nan, nan,  ..., nan, nan, nan],
      [nan, nan, nan,  ..., nan, nan, nan],
      ...,
      [nan, nan, nan,  ..., nan, nan, nan],
      [nan, nan, nan,  ..., nan, nan, nan],
      [nan, nan, nan,  ..., nan, nan, nan]],

     [[nan, nan, nan,  ..., nan, nan, nan],
      [nan, nan, nan,  ..., nan, nan, nan],
      [nan, nan, nan,  ..., nan, nan, nan],
      ...,
      [nan, nan, nan,  ..., nan, nan, nan],
      [nan, nan, nan,  ..., nan, nan, nan],
      [nan, nan, nan,  ..., nan, nan, nan]],

     [[nan, nan, nan,  ..., nan, nan, nan],
      [nan, nan, nan,  ..., nan, nan, nan],
      [nan, nan, nan,  ..., nan, nan, nan],
      ...,
      [nan, nan, nan,  ..., nan, nan, nan],
      [nan, nan, nan,  ..., nan, nan, nan],
      [nan, nan, nan,  ..., nan, nan, nan]]]])}

Settings for the Ksampler node: seed(random) control before generate-randomize steps-4 cfg-1 sampler-lcm scheduler-simple denoise-1

This is the output for the vae decode node: tensor([[[[nan, nan, nan], [nan, nan, nan], [nan, nan, nan], ..., [nan, nan, nan], [nan, nan, nan], [nan, nan, nan]],

     [[nan, nan, nan],
      [nan, nan, nan],
      [nan, nan, nan],
      ...,
      [nan, nan, nan],
      [nan, nan, nan],
      [nan, nan, nan]],

     [[nan, nan, nan],
      [nan, nan, nan],
      [nan, nan, nan],
      ...,
      [nan, nan, nan],
      [nan, nan, nan],
      [nan, nan, nan]],

     ...,

     [[nan, nan, nan],
      [nan, nan, nan],
      [nan, nan, nan],
      ...,
      [nan, nan, nan],
      [nan, nan, nan],
      [nan, nan, nan]],

     [[nan, nan, nan],
      [nan, nan, nan],
      [nan, nan, nan],
      ...,
      [nan, nan, nan],
      [nan, nan, nan],
      [nan, nan, nan]],

     [[nan, nan, nan],
      [nan, nan, nan],
      [nan, nan, nan],
      ...,
      [nan, nan, nan],
      [nan, nan, nan],
      [nan, nan, nan]]]])
      

The load checkpoint node is outputting this for all 3 outputs: <comfy.model_patcher.ModelPatcher object at 0x000001B1F345D340>

The empty laten image node output this: {'samples': tensor([[[[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]],

     [[0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      ...,
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.]],

     [[0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      ...,
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.]],

     [[0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      ...,
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.],
      [0., 0., 0.,  ..., 0., 0., 0.]]]])}

It is set to 1024x1024.

Why is it doing this on over half of the images and how do I stop it.

It doesn't matter what model I use and the show anything node is the only custom node in the workflow. It outputs black images without the show anything node also. System: Alienware X17 r2. Windows 11. RTX 3080ti(16gb vram). 32gb system ram. i9 cpu. 2 NVME SSDs. I am using the stand-alone(not desktop, not portable) version of ComfyUI with the latest front end. Both were updated before running this test. I am pulling from the Master branch. Info from cmd: Checkpoint files will always be loaded safely. Total VRAM 16384 MB, total RAM 32411 MB pytorch version: 2.7.1+cu128 Set vram state to: NORMAL_VRAM Device: cuda:0 NVIDIA GeForce RTX 3080 Ti Laptop GPU : cudaMallocAsync Using pytorch attention Python version: 3.12.9 (tags/v3.12.9:fdb8142, Feb 4 2025, 15:27:58) [MSC v.1942 64 bit (AMD64)] ComfyUI version: 0.3.45 Initializing frontend: Comfy-Org/ComfyUI_frontend@prerelease, requesting version details from GitHub...

This is making ComfyUI unusable.

Aenon1 avatar Jul 27 '25 23:07 Aenon1

Lets me add some information from my side: I'm use the same pytorch version except on a RTX 5090. Python version is 3.12.7 using a venv. I'm on comfy release 0.3.45 but have tried rolling back to 0.3.44 without success. However and that might be important, I'm also using the pre-release frontend. I would be a little surprised if its something related to the frontend but it might be worth checking out.

Njaecha avatar Jul 28 '25 18:07 Njaecha

Njaecha, I don't think it could be the front end. The ksampler is throwing out nothing but NaNs(not a number). Nothing that has been mentioned as a fix, and I've gone back over a year, has worked.

Aenon1 avatar Jul 28 '25 20:07 Aenon1

Do you have sage attention installed? Same error some are getting with qwen image: https://github.com/comfyanonymous/ComfyUI/issues/9077

Disabling sage attention is a workaround, but not really a fix.

phazei avatar Aug 10 '25 09:08 phazei

getting the same errors with xformers attention and fp16 with qwen-image will try the fixes mentioned

gilbrotheraway avatar Aug 10 '25 19:08 gilbrotheraway

fp32 can confirm fixes the issue with black images at the expense of 15% speed for me

gilbrotheraway avatar Aug 10 '25 20:08 gilbrotheraway

The same error only on QWEN (on other models everything works). Black image after 50% of steps (can be seen in the preview). Disabling sage_attention in bat file fixes the black screen, but now there is colorful garbage instead. Disabling fast in bat file fixes this problem.

KLL535 avatar Aug 10 '25 22:08 KLL535

I don't use Qwen so I don't know about that. I was getting the errors before I installed Triton and Sage Attention 2. I am still getting the errors after I installed them. When I use the Flux models with Nunchuka, I don't get the errors. When I use a regular Flux checkpoint, I get them. SDXL is hit and miss. Sometimes, it will make a few images and then a run of 2 to 4 black outputs. I get the same error with every black image. It doesn't matter which model I am using. Error: ComfyUI\nodes.py:1585: RuntimeWarning: invalid value encountered in cast img = Image.fromarray(np.clip(i, 0, 255).astype(np.uint8)) Viewing the output with a Show Any node shows that the ksampler is outputting NaNs.

Aenon1 avatar Aug 11 '25 00:08 Aenon1

More info: I've found out that if I make SDXL images in 512x512 resolution, it works fine. If I increase that size(768x768 or 640x480) I get at least 3 black images out of a batch of 9. If I add a 2nd ksampler to the workflow(denoise=0.2), half or more of the images are black even @ 512x512.

Aenon1 avatar Aug 12 '25 21:08 Aenon1

I am also not getting: error: Array must not contain infs or NaNs Is there any way to stop this from getting to the ksampler and causing a black image? I would rather see no image than have to run through a mult-image batch to find and delete the useless images. This does not happen for every image. Out of a batch of 9, 3 or 4 will be black. It also is not just SDXL, it happens in Flux and Wan also.

Aenon1 avatar Sep 01 '25 23:09 Aenon1

I found a way around this issue. Search manager for: ComfyUI-KJNodes Github: https://github.com/kijai/ComfyUI-KJNodes Dump Comfy's load checkpoint node and replace it with CheckpointLoaderKJ. I use it in every workflow that calls for Comfy's checkpoint loader and I am not having this issue any more. The image shows the setting I use for it.

Image

Aenon1 avatar Oct 27 '25 06:10 Aenon1

Update: a forced beta feature update from Comfy(I don't know which one) has broken all of the KJ checkpoint loaders so I am back to dealing with Comfy's checkpoint loader and back to the same black image/video output.

Aenon1 avatar Dec 21 '25 04:12 Aenon1