Markus Hoehnerbach

Results 11 comments of Markus Hoehnerbach

May I suggest [Github's Citability Guide](https://guides.github.com/activities/citable-code/)? It would be really cool to have a DOI for the project and make referencing it super easy. Cheers!

Hi @iyupan, this issue should be fixed now.

Hi @faruknane, this issue should be fixed now.

Hi @Y-jiji , Are you still experiencing this issue? Typically, when you see no output it indicates an issue with your CUDA or compiler setup where some DLL cannot be...

Hey Rawn, There are a few ways to implement variable sequence length, and this seemed simplest when I wrote this - I'd likely handle it differently today. The first basic...

Hey @devashishshankar! I would expect masking P to be much easier, i.e. (2), however it will not fix your problem with the output tensor. There, you'd either have to use...

https://github.com/NVIDIA/cutlass/blob/main/examples/77_blackwell_fmha/collective/fmha_fusion.hpp#L180 explains this, and what to do about it. Both are imo useful settings, we just make a different choice here. Do you think picking the other way as a...

> I don't quite understand "Q is at the beginning / end of the matrix" in the comments, do you mean aligning causal masks to the upper left corner v.s....

Iirc synccheck can find this (due to the smem read / write) thats how I recently found something like this

@guilhermeleobas @zou3519 i am still seeing errors in the tests, like the following, could you have a look? ``` $ FLASH_ATTENTION_ENABLE_OPCHECK=TRUE FLASH_ATTENTION_ENABLE_AUTOGRAD_CHECK=TRUE python -m pytest -k 'test_flash_attn_varlen_output[256-256-64-True-True-False-15.0-False-False-gqa-dtype0]' test_flash_attn.py ================================================================================================= FAILURES...