Anton Vlasjuk

Results 94 comments of Anton Vlasjuk

Oh yea, one last thing. I've created separate classes for the immutability tests. Thought it got too convoluted otherwise.

@Rocketknight1 With code duplications, do you mean across the different classes between pt/tf/np? I agree with that, it's more so a dependency check to see that it suddenly doesn't branch...

Yup, same typo. The shape annotation is correct tho. Another thing I've noticed in `Jamba` are these line; https://github.com/huggingface/transformers/blob/5962d62bac850cd01ee830ffba880469338c96fd/src/transformers/models/jamba/modeling_jamba.py#L916-L920 If you remember my issue from the past ( #29526 ),...

The 0s are only the backward pass which I assume you haven't done, e.g. using a loss with the output and calling .backward on the loss. See the following example...

Hey @ICharlotteI, thanks for the additional information and even testing it on a different gpu. Tbh, I'm also quite confused as to what's going on then. You're completely right that...

Depending on how you reimplemented the mamba block I'd expect less efficiency (memory and speed wise) as you won't have all cuda optimizations. As you find the problem to be...

If indeed `nn.conv1d` and the cuda `causal_conv1d` turn out to be the issue, I would even suggest submitting an issue in the torch repo + causal_conv1d repo.

You have answered your own question at the end. `mamba-ssm` expects a working cuda installation >=11.6 along an nvidia gpu. The error logs also tell you this: `UserWarning: mamba_ssm was...

> My pytorch is 2.2 with CUDA Version 12.2, but the error still happens. On which system are you and is `nvcc --version` recognized as command in your terminal? If...

They lead to the same class being used, `AutoModelForCausalLM` is basically just a wrapper that infers your specific model, e.g. mamba, for the case, e.g. causal lm, (by using the...