ComfyUI py3.13 torch2.9 cu130 slow torch.compile

Custom Node Testing

[ ] I have tried disabling custom nodes and the issue persists (see how to disable custom nodes if you need help)

Your question

I have my old comfyui setting using torch 2.7.1 cu 124 py3.12 and using torchcompilewanv2 from kjnode but when im using the new comfyui portable 0.3.68 with all the new stuff i can't use any torch compile its much slower then the old setup. I asked GLM4.6 and GPT they said its might be cause of torch 2.9 and py3.13 is newer and less optimize but i don't know. Anyone have the same issue? And i also look into the issue, but the oldest post is from like 2-3 month ago.

Logs

Other

No response

Nov 11 '25 03:11 Blackjack356

I'm experiencing a similar issue with ComfyUI Portable 0.3.68, py3.13.6, torch2.69, and cu130. I've also noticed that VRAM usage has increased compared to older versions, exceeding the 24GB limit. I suspect this is why the workflow has become significantly slower.

Nov 11 '25 11:11 SakuragiA

I'm experiencing a similar issue with ComfyUI Portable 0.3.68, py3.13.6, torch2.69, and cu130. I've also noticed that VRAM usage has increased compared to older versions, exceeding the 24GB limit. I suspect this is why the workflow has become significantly slower.

I finally found the cause of the VRAM usage exceeding limits. It turned out to be caused by Nvidia's latest graphics driver (version 581.80 released on November 4, 2025). Everything returned to normal after I rolled back to version 580.97 released on August 12, 2025. I apologize to ComfyUI for mistakenly blaming it - now everything works perfectly with ComfyUI 0.3.68, Python 3.13.9, and PyTorch 2.9 + CUDA 13.0 Update 2.

Nov 13 '25 03:11 SakuragiA

Does it speed up when using torch.compile ? Cause i dont have VRAM problem. I will test the roll back tho.

Nov 13 '25 07:11 Blackjack356

@Blackjack356 I haven't checked with the latest comfy updates, but before i had to mute the Comfy function that disables the torch compile in the code. This function seems to be now gone in the latest code.

So, i get the speed and the extra vram torch compile provides with a GGUF model, but can't replicate the same with FP16. This used to work fine in the past for both FP and GGUF.

With the FP16 it gives me speed but no extra vram. Compilation time is instant.
With GGUF, gives me both speed and vram, but there is a very small compilation time window ( which is ok )

I think these things are better fixed in Pytorch 2.10, along with the sageattention patch. I'm currently using 2.9 + cuda 13 and python 3.13 (Linux). I'll still have to verify and try running again with FP16 to see if anything changed since the very latest update.

Nov 13 '25 12:11 boyan-orion