Sebastien Boisvert comments

Results 88 comments of


                                            Sebastien Boisvert

[Bug] Unexpected behavior of `memory_efficient_attention` with `BlockDiagonalMask`

Can you submit a PR with your code snippet as the basis for a unit test, and possibly apply the proposed fix ?

scaled_dot_product_attention output is different from memory_efficient_attention

Can you provide more details including: - small code example that reproduce the problem, - expected result - actual result ?

suitable for L40

What is Expected behavior and Actual behavior ? I use xformers with a NVIDIA A40 on https://www.runpod.io/. Both the A40 and L40 have 48 GB VRAM.

Rotary Embedding Not being Registered

Can you provide: - the version that you use - minimal-size code snippet - expected result - actual result

RotaryEmbedding applied to the incorrect channel dimension

Hi @sagadre Here is my understanding. Before splitting your tensor into ``H`` heads, the shape of the tensor is ``[B, M, D]``, where ``B`` is batch size, ``M`` is sequence...

RotaryEmbedding applied to the incorrect channel dimension

Hi again @sagadre It looks like you are right ! I looked at MultiHeadDispatch in xformers, which relies on RotaryEmbedding, and indeed it is used after the split into H...

RotaryEmbedding applied to the incorrect channel dimension

Hi again @sagadre If you look at the unit test for rotary embedding, the input shape is (BATCH, HEADS, SEQ, EMB) and not (BATCH, SEQ, HEADS, EMB): https://github.com/facebookresearch/xformers/blob/748c159096d4f9fcfe3eaf22801e5aed4777210b/tests/test_rotary_embeddings.py#L61 So there...

Is there an efficient way to use memory_efficient_attention with a causal mask that has a small rectangle of zeros?

Hi @arilato Is it the same mask everytime you call memory_efficient_attention ?