Dan Fu comments

Results 103 comments of


                                            Dan Fu

About the versions of PyTorch, CUDA, and other dependencies used in the implementation of the Monarch Mixer

Is this the same as during training? These shapes should not trigger the bug that you have reported earlier. On Tue, Feb 27, 2024 at 6:39 PM blazar ***@***.***> wrote:...

About the versions of PyTorch, CUDA, and other dependencies used in the implementation of the Monarch Mixer

Yes it does. You need to create a different MLP for each layer and adjust the sizes accordingly. On Tue, Feb 27, 2024 at 11:40 PM blazar ***@***.***> wrote: >...

About the versions of PyTorch, CUDA, and other dependencies used in the implementation of the Monarch Mixer

These are the relevant lines of code in `bert/src/mm/blockdiag_multiply.py` that are triggering the bug: ``` def forward(ctx, x, weight): ctx.save_for_backward(x, weight) batch_shape, n = x.shape[:-1], x.shape[-1] batch_dim = np.prod(batch_shape) nblocks,...

About the versions of PyTorch, CUDA, and other dependencies used in the implementation of the Monarch Mixer

x is not shaped correctly here. It should have shape (B, …, D) where B is your batch size, the middle … are your other dimensions, and D is the...

About the versions of PyTorch, CUDA, and other dependencies used in the implementation of the Monarch Mixer

Great! A couple things that could be happening here: - A depthwise convolution only mixes along the sequence (or H/W) dimensions, and not over the channels in the image. For...

M2 model is applied on single image deraining model based on transformer

Hello! Yes, this would be a great use case. You'll want to replace the attention layer with something like this: https://github.com/HazyResearch/m2/blob/main/bert/src/mm/monarch_mixer_sequence_mixer.py And the MLP layer with something like this: https://github.com/HazyResearch/m2/blob/main/bert/src/bert_layers.py#L297...

Dan Fu

About the versions of PyTorch, CUDA, and other dependencies used in the implementation of the Monarch Mixer

About the versions of PyTorch, CUDA, and other dependencies used in the implementation of the Monarch Mixer

About the versions of PyTorch, CUDA, and other dependencies used in the implementation of the Monarch Mixer

About the versions of PyTorch, CUDA, and other dependencies used in the implementation of the Monarch Mixer

About the versions of PyTorch, CUDA, and other dependencies used in the implementation of the Monarch Mixer

M2 model is applied on single image deraining model based on transformer

M2 model is applied on single image deraining model based on transformer

lose lm.yaml

lose lm.yaml

Can FlashFFTConv be used for Conv2d on PyTorch?