Dan Fu

Results 103 comments of Dan Fu

Is this the same as during training? These shapes should not trigger the bug that you have reported earlier. On Tue, Feb 27, 2024 at 6:39 PM blazar ***@***.***> wrote:...

Yes it does. You need to create a different MLP for each layer and adjust the sizes accordingly. On Tue, Feb 27, 2024 at 11:40 PM blazar ***@***.***> wrote: >...

These are the relevant lines of code in `bert/src/mm/blockdiag_multiply.py` that are triggering the bug: ``` def forward(ctx, x, weight): ctx.save_for_backward(x, weight) batch_shape, n = x.shape[:-1], x.shape[-1] batch_dim = np.prod(batch_shape) nblocks,...

x is not shaped correctly here. It should have shape (B, …, D) where B is your batch size, the middle … are your other dimensions, and D is the...

Great! A couple things that could be happening here: - A depthwise convolution only mixes along the sequence (or H/W) dimensions, and not over the channels in the image. For...

Hello! Yes, this would be a great use case. You'll want to replace the attention layer with something like this: https://github.com/HazyResearch/m2/blob/main/bert/src/mm/monarch_mixer_sequence_mixer.py And the MLP layer with something like this: https://github.com/HazyResearch/m2/blob/main/bert/src/bert_layers.py#L297...

You can replace your MLP class with this one (changing the configs to however it works in your model): ```Python import torch from torch import nn from src.mm.blockdiag_linear import BlockdiagLinear...

`lm` is a mapping to `long_conv_lm`: https://github.com/HazyResearch/safari/blob/main/src/utils/registry.py#L24

See this on how to train with this repo: https://github.com/HazyResearch/safari/blob/main/experiments.md

Not yet, but it's something we're very interested in looking into soon. To help us target better - what models do you want to use it for? Are they depthwise...