zzq96
zzq96
I have searched the AAAI digital library, but didnt find it . Can you provide the download link of this supplementary? thanks a lot!
in triton, tl.atomic_add use "rel_acq" memory semantics by default. after changing "rel_acq" to "relaxed", 2x faster of my flash attention triton code.
I have written a bwd triton code , but I found that after adding the bias bwd , the speed is 30 times slower. The following is the time_weight bwd...
I has a tensor, shape is [128,], we want to store this on shared memory and than use atomic_add to update this tensor, like this: ``` def test(output_ptr, input_ptr, input_index_ptr):...
Folder Note plugin can show notes like this:  A card shows the title and a part of the notes This allows you to quickly recall the content of your...