TransformerEngine Fused attention error while running Nvidia Cosmos

Hello

I am trying to run the latest Nvidia Cosmos model on a RTX 4090 and I get an error when fused attention is called : Line 1080 in fused_attn.py / fused_attn_forward /output_tensors = tex.fused_attn_fwd(...)

The transformer_engine compilation didn't produce any error during the installation and I have CuDNN v9.6.0 installed. I have Flash Attention 2.7.3, could this be the issue (there is a warning that says that 2.6.3. is the latest supported) ? is Flash attention used behind the scene ?

E! CuDNN (v90100 70) function cudnnBackendFinalize() called: e! Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: rtc->loadModule() e! Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: ptr.isSupported() e! Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: engine_post_checks(*engine_iface, engine.getPerfKnobs(), req_size, engine.getTargetSMCount()) e! Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: finalize_internal() e! Time: 2025-01-14T00:21:03.310308 (0d+0h+3m+34s since start) e! Process=381629; Thread=381629; GPU=NULL; Handle=NULL; StreamId=NULL.

Many thanks in davance

Jan 13 '25 23:01 deepbeepmeep

i got the same problem, but I am not sure how to resolve

Jan 20 '25 23:01 andypinxinliu

I have fixed it by installing an earlier version of Flash attention (2.6.0 ?)

Jan 21 '25 10:01 deepbeepmeep

thanks for your information, do you mean by installing:

python -m pip install git+https://github.com/Dao-AILab/[email protected]

By the way, since I do not have the sudo, so for cosmos, I tried to install everything myself, I installed torch 2.4/2.5 with cuda version 12.1, 12.4, then i installed nvcc and Cuda toolkit with the same version as cuda, then install cudnn 9.3, all from conda. I am not sure what caused this problem

Jan 21 '25 10:01 andypinxinliu

@andypinxinliu I have the same problem as solved by: https://github.com/Dao-AILab/flash-attention/issues/1421#issuecomment-2575547768

Oct 14 '25 03:10 jingyangcarl