Benjamin Lefaudeux
Benjamin Lefaudeux
# π Feature When applicable, automatically use sparse or blocksparse for causal attention. Right now this requires that people use them explicitly, even if the causal flag is passed, which...
# π Feature Lower favor+causal memory consumption ## Motivation Using a lot of memory for an approximation kind of defeats the purpose.. ## Pitch would make favor more useable for...
# π Possible Bug - Nystrom does not pass the test in https://github.com/facebookresearch/xformers/pull/104 - with same #104, the nystrom specific test does not pass if causal is set (+ Nans...
# π Feature Support tensor // or model // as a built in feature, through Fairscale ? cc @min-xu-ai @anj-s @suchenzang @VitaliyLi @iyerr3 ## Motivation This is typically extra work...
# π Feature Luna has an extra "context" path, I think that several other attentions do something similar (like the attentions which try to keep a long term memory), it...
# π Bug Right now the LRA implementation uses attention masking ([see](https://github.com/facebookresearch/xformers/blob/main/xformers/benchmarks/LRA/code/model_wrapper.py#L199)) for the MLM task, which is probably wrong for a couple of attentions (would need investigation). Key masking...
- move the enable/disable call to being part of the base DiffusionPipeline (removes a bunch of duplicates) - make the call recursive across all the modules in the model graph,...
20Go -> 16Go Ram use for some workloads, same speed (you donΒ΄t have to materialize intermediates with torch.cdist)