ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

Extremely slow to load Flux FP8, after updated to pytorch 2.4.0+cu124, UserWarning: 1Torch was not compiled with flash attention

Open LiJT opened this issue 1 year ago • 5 comments

Expected Behavior

My old environment setup: pytorch version: 2.1.2+cu118 xformers version: 0.0.23.post1

My current setup: pytorch version: 2.4.0+cu124 xformers version: 0.0.28.dev895

Both Transformors version : 4.44

When using my old python environment, the Flux fp8 model load speed is significantly better!! Almost 20 seconds faster than my current torch 240 setup!

My current version is commit https://github.com/comfyanonymous/ComfyUI/commit/38c22e631ad090a4841e4a0f015a30c565a9f7fc

Actual Behavior

Now it loads so slow! And I notice a error in command window now, my old env does NOT have this issue before, it is highlighted in bold below:

got prompt Using xformers attention in VAE Using xformers attention in VAE model weight dtype torch.float8_e4m3fn, manual cast: torch.bfloat16 model_type FLUX Requested to load FluxClipModel_ Loading 1 new model loaded completely 0.0 9319.23095703125 True clip missing: ['text_projection.weight'] E:\ComfyUI-aki-v1.3\comfy\ldm\modules\attention.py:407: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.) out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False) Requested to load Flux Loading 1 new model loaded completely 0.0 11350.048889160156 True Requested to load AutoencodingEngine Loading 1 new model loaded completely 0.0 159.87335777282715 True Prompt executed in 61.99 seconds got prompt Prompt executed in 10.52 seconds got prompt loaded completely 14760.286697097778 11350.048889160156 True Prompt executed in 9.98 seconds

What does "UserWarning: 1Torch was not compiled with flash attention" mean? I tried search on issue page saw bunch of guys have the exact same issue as me. Is it the cause that make my model load speed so slow??

Thank you!

Steps to Reproduce

I m using this offical comfyUI for Flux workflow didnt change much Flux Dev Official.json

and I also tried disable all of the custom node by using --disable-all-custom-nodes still have the exact same issue I also tried --fast, the generation speed is much faster, but load speed still took me ~2min, its definitely not normal

"E:\ComfyUI-aki-v1.3\comfy\ldm\modules\attention.py:407: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.) out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)"

Debug Logs

[START] Security scan
[DONE] Security scan
## ComfyUI-Manager: installing dependencies done.
** ComfyUI startup time: 2024-08-28 23:58:35.497151
** Platform: Windows
** Python version: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
** Python executable: E:\ComfyUI-aki-v1.3\python\python.exe
** ComfyUI Path: E:\ComfyUI-aki-v1.3
** Log path: E:\ComfyUI-aki-v1.3\comfyui.log

Prestartup times for custom nodes:
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\rgthree-comfy
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-Marigold
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-Easy-Use
   3.4 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-Manager

Total VRAM 24563 MB, total RAM 65312 MB
pytorch version: 2.4.0+cu124
xformers version: 0.0.28.dev895
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 4090 : cudaMallocAsync
Using xformers cross attention
[Prompt Server] web root: E:\ComfyUI-aki-v1.3\web
Adding extra search path checkpoints E:/SD-webui-aki\models/Stable-diffusion
Adding extra search path configs E:/SD-webui-aki\models/Stable-diffusion
Adding extra search path vae E:/SD-webui-aki\models/VAE
Adding extra search path loras E:/SD-webui-aki\models/Lora
Adding extra search path loras E:/SD-webui-aki\models/LyCORIS
Adding extra search path upscale_models E:/SD-webui-aki\models/ESRGAN
Adding extra search path upscale_models E:/SD-webui-aki\models/RealESRGAN
Adding extra search path upscale_models E:/SD-webui-aki\models/SwinIR
Adding extra search path embeddings E:/SD-webui-aki\embeddings
Adding extra search path hypernetworks E:/SD-webui-aki\models/hypernetworks
Adding extra search path controlnet E:/SD-webui-aki\models/ControlNet
Adding extra search path clip E:/SD-webui-aki\models/clip/
Adding extra search path clip_vision E:/SD-webui-aki\models/clip_vision
Adding extra search path ipadapter E:/SD-webui-aki\models/ipadapter
Adding extra search path blip E:/SD-webui-aki\models/BLIP
Adding extra search path instantid E:/SD-webui-aki\models/instantid
Adding extra search path insightface E:/SD-webui-aki\models/insightface
Adding E:\ComfyUI-aki-v1.3\custom_nodes to sys.path
Efficiency Nodes: Attempting to add Control Net options to the 'HiRes-Fix Script' Node (comfyui_controlnet_aux add-on)...Success!
Loaded Efficiency nodes from E:\ComfyUI-aki-v1.3\custom_nodes\efficiency-nodes-comfyui
Loaded ControlNetPreprocessors nodes from E:\ComfyUI-aki-v1.3\custom_nodes\comfyui_controlnet_aux
Loaded AdvancedControlNet nodes from E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-Advanced-ControlNet
Could not find AnimateDiff nodes
Loaded IPAdapter nodes from E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_IPAdapter_plus
Loaded VideoHelperSuite from E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-VideoHelperSuite
### Loading: ComfyUI-Impact-Pack (V7.3.1)
### Loading: ComfyUI-Impact-Pack (Subpack: V0.6)
### Loading: ComfyUI-Impact-Pack (V7.3.1)
Loaded ImpactPack nodes from E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-Impact-Pack
[Impact Pack] Wildcards loading done.
[Impact Pack] Wildcards loading done.
[Crystools INFO] Crystools version: 1.16.6
[Crystools INFO] CPU: Intel(R) Core(TM) i9-14900K - Arch: AMD64 - OS: Windows 10
[Crystools INFO] Pynvml (Nvidia) initialized.
[Crystools INFO] GPU/s:
[Crystools INFO] 0) NVIDIA GeForce RTX 4090
[Crystools INFO] NVIDIA Driver: 560.94
[ComfyUI-Easy-Use] server: v1.2.2 Loaded
[ComfyUI-Easy-Use] web root: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-Easy-Use\web_version/v2 Loaded
### Loading: ComfyUI-Impact-Pack (V7.3.1)
[Impact Pack] Wildcards loading done.
### Loading: ComfyUI-Inspire-Pack (V0.86.1)
Total VRAM 24563 MB, total RAM 65312 MB
pytorch version: 2.4.0+cu124
xformers version: 0.0.28.dev895
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 4090 : cudaMallocAsync
Please install pyav to use video processing functions.
theUpsiders Logic Nodes: Loaded
### Loading: ComfyUI-Manager (V2.50.2)
### ComfyUI Revision: 2622 [38c22e63] | Released on '2024-08-27'
json_repair## OK
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/github-stats.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/custom-node-list.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/alter-list.json
--------------
 ### Mixlab Nodes: Loaded
ChatGPT.available True
edit_mask.available True
ClipInterrogator.available True
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/model-list.json
PromptGenerate.available True
ChinesePrompt.available True
RembgNode_.available True
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/extension-node-map.json
TripoSR.available
MiniCPMNode.available
 --------------
(pysssss:WD14Tagger) [DEBUG] Available ORT providers: TensorrtExecutionProvider, CUDAExecutionProvider, CPUExecutionProvider
(pysssss:WD14Tagger) [DEBUG] Using ORT providers: CUDAExecutionProvider, CPUExecutionProvider
Workspace manager - Openning file hash dict
🦄🦄Loading: Workspace Manager (V2.1.0)
------------------------------------------
Comfyroll Studio v1.76 :  175 Nodes Loaded
------------------------------------------
** For changes, please see patch notes at https://github.com/Suzie1/ComfyUI_Comfyroll_CustomNodes/blob/main/Patch_Notes.md
** For help, please see the wiki at https://github.com/Suzie1/ComfyUI_Comfyroll_CustomNodes/wiki
------------------------------------------
### [START] ComfyUI AlekPet Nodes v1.0.20 ###
 ** Comfly Loaded : fly, just fly
Node -> ArgosTranslateNode: ArgosTranslateCLIPTextEncodeNode, ArgosTranslateTextNode [Loading]
Node -> DeepTranslatorNode: DeepTranslatorCLIPTextEncodeNode, DeepTranslatorTextNode [Loading]
Node -> GoogleTranslateNode: GoogleTranslateCLIPTextEncodeNode, GoogleTranslateTextNode [Loading]
Node -> ExtrasNode: PreviewTextNode, HexToHueNode, ColorsCorrectNode [Loading]
Node -> PoseNode: PoseNode [Loading]
Node -> IDENode: IDENode [Loading]
Node -> PainterNode: PainterNode [Loading]
### [END] ComfyUI AlekPet Nodes ###
FizzleDorf Custom Nodes: Loaded
# 😺dzNodes: LayerStyle -> Cannot import name 'guidedFilter' from 'cv2.ximgproc'
A few nodes cannot works properly, while most nodes are not affected. Please REINSTALL package 'opencv-contrib-python'.
For detail refer to https://github.com/chflame163/ComfyUI_LayerStyle/issues/5
# 😺dzNodes: LayerStyle -> Warning: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_LayerStyle\resource_dir.ini not found, default directory to be used.
# 😺dzNodes: LayerStyle -> Find 1 LUTs in E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_LayerStyle\lut
# 😺dzNodes: LayerStyle -> Find 1 Fonts in E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_LayerStyle\font
Patching UNetModel.forward
UNetModel.forward has been successfully patched.
[Power Noise Suite]: 🦚🦚🦚 Squeaa-squee!!! 🦚🦚🦚
[Power Noise Suite]: Tamed 11 wild nodes.

[rgthree] Loaded 42 epic nodes.
[rgthree] NOTE: Will NOT use rgthree's optimized recursive execution as ComfyUI has changed.

WAS Node Suite: OpenCV Python FFMPEG support is enabled
WAS Node Suite Warning: `ffmpeg_bin_path` is not set in `E:\ComfyUI-aki-v1.3\custom_nodes\was-node-suite-comfyui\was_suite_config.json` config file. Will attempt to use system ffmpeg binaries if available.
WAS Node Suite: Finished. Loaded 218 nodes successfully.

        "You have within you right now, everything you need to deal with whatever the world can throw at you." - Brian Tracy


Import times for custom nodes:
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\rembg-comfyui-node-better
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\AIGODLIKE-ComfyUI-Translation
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_IPAdapter_plus
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\efficiency-nodes-comfyui
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\Comfyui_TTP_Toolset
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ControlNet-LLLite-ComfyUI
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-VideoHelperSuite
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-Logic
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\FreeU_Advanced
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\cg-use-everywhere
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_TiledKSampler
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\stability-ComfyUI-nodes
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\cg-image-picker
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\comfyui-photoshop
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\comfyui_controlnet_aux
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-WD14-Tagger
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\wlsh_nodes
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\websocket_image_save.py
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-eesahesNodes
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\PowerNoiseSuite
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\Comfyui_CXH_joy_caption
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_experiments
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-TiledDiffusion
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-Impact-Pack
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\comfyui-inpaint-nodes
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyMath
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_essentials
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\images-grid-comfy-plugin
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_UltimateSDUpscale
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-DepthAnythingV2
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-Custom-Scripts
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-Florence2
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\comfy-image-saver
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\Comfyui_Comfly
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\x-flux-comfyui
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\Derfuu_ComfyUI_ModdedNodes
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-Advanced-ControlNet
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_ExtraModels
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\rgthree-comfy
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-KJNodes
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\comfyui-workspace-manager
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-Marigold
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-AnimateDiff-Evolved
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_Comfyroll_CustomNodes
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_MiniCPM-V-2_6-int4
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-Inspire-Pack
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_bitsandbytes_NF4
   0.0 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-Long-CLIP
   0.1 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-Crystools
   0.1 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_LayerStyle
   0.2 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-LLaVA-OneVision
   0.2 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_FizzNodes
   0.3 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-Manager
   0.3 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-SUPIR
   0.5 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-Easy-Use
   1.1 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\comfyui-mixlab-nodes
   2.3 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_Custom_Nodes_AlekPet
   3.3 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\was-node-suite-comfyui
   3.3 seconds: E:\ComfyUI-aki-v1.3\custom_nodes\comfyui-art-venture





Starting server
To see the GUI go to: http://192.168.3.99:8188 or http://127.0.0.1:8188
To see the GUI go to: https://192.168.3.99:8189 or https://127.0.0.1:8189
E:\ComfyUI-aki-v1.3\custom_nodes\comfyui-mixlab-nodes\webApp\lib/photoswipe-lightbox.esm.min.js
E:\ComfyUI-aki-v1.3\custom_nodes\comfyui-mixlab-nodes\webApp\lib/photoswipe.min.css
FETCH DATA from: E:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI-Manager\extension-node-map.json [DONE]
E:\ComfyUI-aki-v1.3\custom_nodes\comfyui-mixlab-nodes\webApp\lib/pickr.min.js
E:\ComfyUI-aki-v1.3\custom_nodes\comfyui-mixlab-nodes\webApp\lib/classic.min.css
[]
[]
E:\ComfyUI-aki-v1.3\custom_nodes\comfyui-mixlab-nodes\webApp\lib/model-viewer.min.js
E:\ComfyUI-aki-v1.3\custom_nodes\comfyui-mixlab-nodes\webApp\lib/juxtapose.min.js
E:\ComfyUI-aki-v1.3\custom_nodes\comfyui-mixlab-nodes\webApp\lib/juxtapose.css
[]
[]
got prompt
Using xformers attention in VAE
Using xformers attention in VAE
model weight dtype torch.float8_e4m3fn, manual cast: torch.bfloat16
model_type FLUX
Requested to load FluxClipModel_
Loading 1 new model
loaded completely 0.0 9319.23095703125 True
clip missing: ['text_projection.weight']
E:\ComfyUI-aki-v1.3\comfy\ldm\modules\attention.py:407: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
  out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)
Requested to load Flux
Loading 1 new model
loaded completely 0.0 11350.048889160156 True
Requested to load AutoencodingEngine
Loading 1 new model
loaded completely 0.0 159.87335777282715 True
Prompt executed in 61.99 seconds
got prompt
Prompt executed in 10.52 seconds
got prompt
loaded completely 14760.286697097778 11350.048889160156 True
Prompt executed in 9.98 seconds

Other

I saw other people having the same issue https://github.com/comfyanonymous/ComfyUI/issues/3363

LiJT avatar Aug 28 '24 16:08 LiJT