Dan Fu
Dan Fu
Is this the same as during training? These shapes should not trigger the bug that you have reported earlier. On Tue, Feb 27, 2024 at 6:39 PM blazar ***@***.***> wrote:...
Yes it does. You need to create a different MLP for each layer and adjust the sizes accordingly. On Tue, Feb 27, 2024 at 11:40 PM blazar ***@***.***> wrote: >...
These are the relevant lines of code in `bert/src/mm/blockdiag_multiply.py` that are triggering the bug: ``` def forward(ctx, x, weight): ctx.save_for_backward(x, weight) batch_shape, n = x.shape[:-1], x.shape[-1] batch_dim = np.prod(batch_shape) nblocks,...
x is not shaped correctly here. It should have shape (B, …, D) where B is your batch size, the middle … are your other dimensions, and D is the...
Great! A couple things that could be happening here: - A depthwise convolution only mixes along the sequence (or H/W) dimensions, and not over the channels in the image. For...
Hello! Yes, this would be a great use case. You'll want to replace the attention layer with something like this: https://github.com/HazyResearch/m2/blob/main/bert/src/mm/monarch_mixer_sequence_mixer.py And the MLP layer with something like this: https://github.com/HazyResearch/m2/blob/main/bert/src/bert_layers.py#L297...
You can replace your MLP class with this one (changing the configs to however it works in your model): ```Python import torch from torch import nn from src.mm.blockdiag_linear import BlockdiagLinear...
`lm` is a mapping to `long_conv_lm`: https://github.com/HazyResearch/safari/blob/main/src/utils/registry.py#L24
See this on how to train with this repo: https://github.com/HazyResearch/safari/blob/main/experiments.md
Not yet, but it's something we're very interested in looking into soon. To help us target better - what models do you want to use it for? Are they depthwise...