Benjamin Lefaudeux issues

Results 8 issues of


                                            Benjamin Lefaudeux

[perf] Auto causal <> sparse

# 🚀 Feature When applicable, automatically use sparse or blocksparse for causal attention. Right now this requires that people use them explicitly, even if the causal flag is passed, which...

enhancement

[improvement] lower Favor+causal memory consumption

# 🚀 Feature Lower favor+causal memory consumption ## Motivation Using a lot of memory for an approximation kind of defeats the purpose.. ## Pitch would make favor more useable for...

enhancement

blocked

[follow up] Check Nystrom + causal

# 🐛 Possible Bug - Nystrom does not pass the test in https://github.com/facebookresearch/xformers/pull/104 - with same #104, the nystrom specific test does not pass if causal is set (+ Nans...

Support Tensor parallel or pipeline parallel out of the box

# 🚀 Feature Support tensor // or model // as a built in feature, through Fairscale ? cc @min-xu-ai @anj-s @suchenzang @VitaliyLi @iyerr3 ## Motivation This is typically extra work...

brainstorm

[block factory] Handle attentions which return one or several extra context tensors

# 🚀 Feature Luna has an extra "context" path, I think that several other attentions do something similar (like the attentions which try to keep a long term memory), it...

[LRA] Use key masking instead of attention mask

# 🐛 Bug Right now the LRA implementation uses attention masking ([see](https://github.com/facebookresearch/xformers/blob/main/xformers/benchmarks/LRA/code/model_wrapper.py#L199)) for the MLM task, which is probably wrong for a couple of attentions (would need investigation). Key masking...

[refactor] Making the xformers mem-efficient attention activation recursive

- move the enable/disable call to being part of the base DiffusionPipeline (removes a bunch of duplicates) - make the call recursive across all the modules in the model graph,...

Compute embedding distances with torch.cdist

20Go -> 16Go Ram use for some workloads, same speed (you don´t have to materialize intermediates with torch.cdist)