Anton Vlasjuk comments

Results 94 comments of


                                            Anton Vlasjuk

Immutability for data collators

Oh yea, one last thing. I've created separate classes for the immutability tests. Thought it got too convoluted otherwise.

Immutability for data collators

@Rocketknight1 With code duplications, do you mean across the different classes between pt/tf/np? I agree with that, it's more so a dependency check to see that it suddenly doesn't branch...

Fix a shape annotation and typos in `mamba` slow forward

Yup, same typo. The shape annotation is correct tho. Another thing I've noticed in `Jamba` are these line; https://github.com/huggingface/transformers/blob/5962d62bac850cd01ee830ffba880469338c96fd/src/transformers/models/jamba/modeling_jamba.py#L916-L920 If you remember my issue from the past ( #29526 ),...

Issue: Forward pass of mamba block results in 0s on a Quadro P6000 GPU

The 0s are only the backward pass which I assume you haven't done, e.g. using a loss with the output and calling .backward on the loss. See the following example...

Issue: Forward pass of mamba block results in 0s on a Quadro P6000 GPU

Hey @ICharlotteI, thanks for the additional information and even testing it on a different gpu. Tbh, I'm also quite confused as to what's going on then. You're completely right that...

Issue: Forward pass of mamba block results in 0s on a Quadro P6000 GPU

Depending on how you reimplemented the mamba block I'd expect less efficiency (memory and speed wise) as you won't have all cuda optimizations. As you find the problem to be...

Issue: Forward pass of mamba block results in 0s on a Quadro P6000 GPU

If indeed `nn.conv1d` and the cuda `causal_conv1d` turn out to be the issue, I would even suggest submitting an issue in the torch repo + causal_conv1d repo.

Errors when installing the package

You have answered your own question at the end. `mamba-ssm` expects a working cuda installation >=11.6 along an nvidia gpu. The error logs also tell you this: `UserWarning: mamba_ssm was...

Errors when installing the package

> My pytorch is 2.2 with CUDA Version 12.2, but the error still happens. On which system are you and is `nvcc --version` recognized as command in your terminal? If...

Can `MambaForCausalLM` be used directly for training instead of `AutoModelForCausalLM`?

They lead to the same class being used, `AutoModelForCausalLM` is basically just a wrapper that infers your specific model, e.g. mamba, for the case, e.g. causal lm, (by using the...