transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Mamba2 conversion script for original models

Open vasqu opened this issue 1 year ago • 4 comments

What does this PR do?

Extends the Mamba2 conversion script to be compatible with the paper models and codestral. I need some help handling the tokenizer or more specifically how I can overwrite the padding side of a pretrained tokenizer and then save it with the new side.

Fixes #32496

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [x] Did you read the contributor guideline, Pull Request section?
  • [x] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • [ ] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

@ArthurZucker @molbap

vasqu avatar Aug 10 '24 15:08 vasqu

Tests seem unrelated to me. I'll rebase after resolving everything in the review.

vasqu avatar Aug 12 '24 22:08 vasqu

Yes there are issues with tests on main right now, on it! unrelated to your changes. To choose the model to convert I think it should be user-dependent - meaning, it could simply be a choice argument in the parser and that's it, right?

molbap avatar Aug 13 '24 06:08 molbap

Oh yea for sure, I kinda just went ahead and tried to automate everything :facepalm: Lemme change it in the evening when I have proper time!

vasqu avatar Aug 13 '24 07:08 vasqu

With all the fixes/patches in previous PRs, this PR should be ready now.

I think there are still two core things to consider:

  • Is the addition of the Mamba2Tokenizer fine?
  • Is the model dictionary _MAMBA2_MODELS_DICT fine with the partial functions or is this overengineered? :eyes:

vasqu avatar Aug 20 '24 20:08 vasqu

Feel free to ping me for another review once ready!

ArthurZucker avatar Aug 28 '24 09:08 ArthurZucker

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ArthurZucker Should be good for review :eyes: just checked locally to make sure it works.

vasqu avatar Aug 28 '24 11:08 vasqu