ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

Add env TORCH_AMD_CUDNN_ENABLED (third try)

Open comfy-ovum opened this issue 3 months ago • 8 comments

To offset the substantial effects of https://github.com/comfyanonymous/ComfyUI/pull/10302, this PR provides (and informs the user of) an environment variable that can be set to nullify the unilateral decision made in https://github.com/comfyanonymous/ComfyUI/pull/10302 to disable cudNN for all AMD users.

nothing else in comfy uses env vars to enable/disable stuff

@comfyanonymous

You keep insisting that because cudnn=False runs faster for you, it must therefore be forced on everyone. That is not engineering. That is theology.

Let us review what you have done. Your pull request simply hard-codes torch.backends.cudnn.enabled = False for all RDNA3 and RDNA4 users. You wrote that you "have no idea why it helps but it does" on your system. That may be true for your test box, your driver, your kernel. But issue #10447 shows another user whose performance collapsed the moment cudnn was disabled. Issue #10460 shows the same pattern. For them, your patch breaks what once worked. That alone should end this argument: if a change helps some and harms others, the correct path is configurability, not decree.

Then you said "nothing else in Comfy uses env vars to enable or disable stuff." False. Comfy already reads them: COMFYUI_DIR, proxies, path expansions, HTTP settings. Users have asked for .env support and config overrides repeatedly. Pretending this tool never touches environment variables is historical revision, not justification. The absence of a precedent is not a reason to block a useful one.

ComfyUI runs on wildly different hardware and software combinations: Linux, Windows, ROCm 6.x, Torch 2.x, FlashAttention, tuned kernels, patched builds. The very nature of this ecosystem demands flexibility. A developer who locks a single behavior across such diversity is courting regression. Hardware changes. Drivers update. A fix today becomes a bottleneck tomorrow. Your forced flag will age like milk.

The purpose of an env var is precisely this: to give users an escape hatch when automatic detection fails or when blanket assumptions crumble. A flag such as COMFYUI_CUDNN_ENABLED=1 or 0 would let everyone test, measure, and choose without touching the source. It adds no maintenance cost. It adds resilience. It adds honesty.

If you truly believe in your optimization, you can keep it as the default, as this PR allows. It simply adds a message and the required fuctionality to support it:

"cudnn disabled for AMD; set COMFYUI_CUDNN_ENABLED=1 to re-enable."

That informs without coercing. That is how grown-up software behaves.

Right now, your stance is that your environment defines truth. It does not. It defines your truth. And until you allow others to define theirs, what you are enforcing is not a performance improvement, but a constraint masquerading as genius.

comfy-ovum avatar Nov 07 '25 17:11 comfy-ovum

Related to this subject, the issue with cudnn might be close to be solved in the near future, at least with torch nightly builds, once they will be compiled with cudnn v. 9.15. Keeping an eye on some discussions at the Pytorch repository I came to the conclusion that it might be the case that it will be even a way to use a separate cudnn 9.15 installation instead of the one compiled inside torch, beginning with stable 2.9.1. What exactly will be that way is above my level of knowledge. I opened an issue at Pytorch repository to ask some clarifications because the subject is way over my head. I too would like to see the issue solved as soon as possible so I am kinda anxious. https://github.com/pytorch/pytorch/issues/167242

jovan2009 avatar Nov 07 '25 22:11 jovan2009

Why make it an environmental variable when you can just make it a launch arg like --force-cudnn-enabled?

In cli_args.py, you'd add something like: parser.add_argument("--force-cudnn-enabled", action="store_true", help="Force ComfyUI to use CUDNN descriptive words with more words following words")

The model_management.py already imports the args.

RandomGitUser321 avatar Nov 10 '25 21:11 RandomGitUser321

Related to this subject, the issue with cudnn might be close to be solved in the near future, at least with torch nightly builds, once they will be compiled with cudnn v. 9.15. Keeping an eye on some discussions at the Pytorch repository I came to the conclusion that it might be the case that it will be even a way to use a separate cudnn 9.15 installation instead of the one compiled inside torch, beginning with stable 2.9.1. What exactly will be that way is above my level of knowledge. I opened an issue at Pytorch repository to ask some clarifications because the subject is way over my head. I too would like to see the issue solved as soon as possible so I am kinda anxious. pytorch/pytorch#167242

Update: today I managed to run ComfyUI with last CUDNN version 9.16. I got a performance leap with one of my usual WAN 2.2 I2V workflows from about 45-50 s/it to about 35 s/it. For details about what I did look at the link at pytorch repository I mentioned in my previous post here. TLDR: I used last CUDNN 9.16 downloaded from nvidia, I installed it and I symply drag and drop the dlls over the same dlls in torch folder, torch being the yesterday nightly 2.10.0.dev20251114+cu130. I modified a number in the file ops.py, you can see exacly where at this commit: b4f30bd4087a79b4c4fc89bb67b9889adb866294. I put a larger number, something like 91700 in order to skip over the current workaround. Instead my PC exploding or being sucked into a black hole, it worked. :)))))) Cheers! @comfy-ovum @comfyanonymous

jovan2009 avatar Nov 15 '25 15:11 jovan2009

Related to this subject, the issue with cudnn might be close to be solved in the near future, at least with torch nightly builds, once they will be compiled with cudnn v. 9.15. Keeping an eye on some discussions at the Pytorch repository I came to the conclusion that it might be the case that it will be even a way to use a separate cudnn 9.15 installation instead of the one compiled inside torch, beginning with stable 2.9.1. What exactly will be that way is above my level of knowledge. I opened an issue at Pytorch repository to ask some clarifications because the subject is way over my head. I too would like to see the issue solved as soon as possible so I am kinda anxious. pytorch/pytorch#167242

Update: today I managed to run ComfyUI with last CUDNN version 9.16. I got a performance leap with one of my usual WAN 2.2 I2V workflows from about 45-50 it/s to about 35 it/s. For details about what I did look at the link at pytorch repository I mentioned in my previous post here. TLDR: I used last CUDNN 9.16 downloaded from nvidia, I installed it and I symply drag and drop the dlls over the same dlls in torch folder, torch being the yesterday nightly 2.10.0.dev20251114+cu130. I modified a number in the file ops.py, you can see exacly where at this commit: b4f30bd. I put a larger number, something like 91700 in order to skip over the current workaround. Instead my PC exploding or being sucked into a black hole, it worked. :)))))) Cheers! @comfy-ovum @comfyanonymous

45-50 it/s down to 35 it/s seems like significant slow down, not speedup to me. But maybe my math is just wrong and works different these days.

VladanZ avatar Nov 16 '25 11:11 VladanZ

45-50 it/s down to 35 it/s seems like significant slow down, not speedup to me. But maybe my math is just wrong and works different these days.

@VladanZ You are right, I apparently mistyped, it is the other way around, not it/s but s/it. In other words shorter time with this modification compared with "before". Thanks for pointing this out, I will make the correction. It was a wan 2.2 + 4 steps Lora workflow, it would have been wonderful to have 45 or 35 steps/ second.

jovan2009 avatar Nov 16 '25 12:11 jovan2009

Related to this subject, the issue with cudnn might be close to be solved in the near future, at least with torch nightly builds, once they will be compiled with cudnn v. 9.15. Keeping an eye on some discussions at the Pytorch repository I came to the conclusion that it might be the case that it will be even a way to use a separate cudnn 9.15 installation instead of the one compiled inside torch, beginning with stable 2.9.1. What exactly will be that way is above my level of knowledge. I opened an issue at Pytorch repository to ask some clarifications because the subject is way over my head. I too would like to see the issue solved as soon as possible so I am kinda anxious. pytorch/pytorch#167242

Update: today I managed to run ComfyUI with last CUDNN version 9.16. I got a performance leap with one of my usual WAN 2.2 I2V workflows from about 45-50 s/it to about 35 s/it. For details about what I did look at the link at pytorch repository I mentioned in my previous post here. TLDR: I used last CUDNN 9.16 downloaded from nvidia, I installed it and I symply drag and drop the dlls over the same dlls in torch folder, torch being the yesterday nightly 2.10.0.dev20251114+cu130. I modified a number in the file ops.py, you can see exacly where at this commit: b4f30bd. I put a larger number, something like 91700 in order to skip over the current workaround. Instead my PC exploding or being sucked into a black hole, it worked. :)))))) Cheers! @comfy-ovum @comfyanonymous

Any idea if there could be such a workaround for Linux ?

tapstoop avatar Dec 11 '25 08:12 tapstoop

Any idea if there could be such a workaround for Linux ?

If you look at the linked above pytorch opened issue (and at torch 2.9.1 release notes https://github.com/pytorch/pytorch/releases), beginning with pytorch 2.9.1 the "official" workaround was intended to be by simply installing with pip latest nvidia-cudnn-cu13 python package and pytorch should use that as its Cudnn. In Windows for me it didn't work, that's why I resorted to copy Cudnn files over Pytorch installation. But I assume in Linux should work, at least?

Edit: note that if you manage to make Pytorch to use the latest cudnn there is no need any more to edit comfyui files on current git version, it will not use conv3d workaround on cudnn >= 9.15

jovan2009 avatar Dec 11 '25 11:12 jovan2009

This is about an AMD workaround. I don't think nvidia cudnn or cuda library versions can be related.

The root cause could be a regression in miopen in the latest rocm releases. I have confirmed at least some operations have regressed since 6.4, though I don't think this particular operation is what motivated disabling cudnn/miopen.

alexheretic avatar Dec 12 '25 01:12 alexheretic