Benjamin Lefaudeux comments

Results 91 comments of


                                            Benjamin Lefaudeux

Add support for Zero3 FSDP

>> During validation, each worker sees a variable number of examples. This is okay in itself, but it is problematic (hang) if it results in any worker having extra batches....

Add support for Zero3 FSDP

> > > During validation, each worker sees a variable number of examples. This is okay in itself, but it is problematic (hang) if it results in any worker having...

Up to 2x speedup on GPUs using memory efficient attention

> Hi @MatthieuTPHR - this looks like a great improvement! > > > Would it be possible to add a more optimised kernel for head-dim=40 which is the parameter used...

Up to 2x speedup on GPUs using memory efficient attention

> I've tried running the code in this PR, but I'm getting the following error: > > ``` > AttributeError: module 'triton.language' has no attribute 'constexpr' > ``` > >...

Up to 2x speedup on GPUs using memory efficient attention

> Does it require a GPU with tensor cores (RTX 20 Series and above) ? getting : `WARNING:root:Blocksparse is not available: the current GPU does not expose Tensor cores` >...

Up to 2x speedup on GPUs using memory efficient attention

> @blefaudeux I'm using linux in wsl2, the problem might be related to the version of torch and torchvision really sorry about that.. are you able to use conda there...

Up to 2x speedup on GPUs using memory efficient attention

> Tested on GTX 1070ti : Without Memory efficient cross attention at 512x512 : **1.78 it/s** > > With Memory efficient cross attention at 512x512 : **2.34 it/s** > >...

colab with xformers is faster

FYI installing xformers should be easier now on linux platforms (especially with colab), just `pip install xformers` should give you this attention mechanism (the kernels come pre-built) as of a...

Compute embedding distances with torch.cdist

cc @patrickvonplaten, not what we discussed but this is an effective three liner

Compute embedding distances with torch.cdist

> Hey @blefaudeux, > > > > How to you use this feature I think it's only used in decoding if `"force_not_quantize"` is set to `True` no? It's [in the...