Markus Hoehnerbach comments

Results 11 comments of


                                            Markus Hoehnerbach

How to refer?

May I suggest [Github's Citability Guide](https://guides.github.com/activities/citable-code/)? It would be really cool to have a DOI for the project and make referencing it super easy. Cheers!

Error on einsum_test.py

Hi @iyupan, this issue should be fixed now.

Test program contraction_simple.cu needs a fix!

Hi @faruknane, this issue should be fixed now.

cuTENSOR/contraction.cu is compiled successfully, but the object file doesn't work.

Hi @Y-jiji , Are you still experiencing this issue? Typically, when you see no output it indicates an issue with your CUDA or compiler setup where some DLL cannot be...

[QST] Questions about ex77 blackwell Flash attention

Hey Rawn, There are a few ways to implement variable sequence length, and this seemed simplest when I wrote this - I'd likely handle it differently today. The first basic...

[QST] Questions about ex77 blackwell Flash attention

Hey @devashishshankar! I would expect masking P to be much easier, i.e. (2), however it will not fix your problem with the output tensor. There, you'd either have to use...

[BUG] FMHA fwd kernel causal misbehaves when qlen != klen

https://github.com/NVIDIA/cutlass/blob/main/examples/77_blackwell_fmha/collective/fmha_fusion.hpp#L180 explains this, and what to do about it. Both are imo useful settings, we just make a different choice here. Do you think picking the other way as a...

[BUG] FMHA fwd kernel causal misbehaves when qlen != klen

> I don't quite understand "Q is at the beginning / end of the matrix" in the comments, do you mean aligning causal masks to the upper left corner v.s....

Fix TMEM address read/write race in example 77

Iirc synccheck can find this (due to the smem read / write) thats how I recently found something like this

Add torch.compile support to flash attention 3

@guilhermeleobas @zou3519 i am still seeing errors in the tests, like the following, could you have a look? ``` $ FLASH_ATTENTION_ENABLE_OPCHECK=TRUE FLASH_ATTENTION_ENABLE_AUTOGRAD_CHECK=TRUE python -m pytest -k 'test_flash_attn_varlen_output[256-256-64-True-True-False-15.0-False-False-gqa-dtype0]' test_flash_attn.py ================================================================================================= FAILURES...