Feature Request: Add Support for ModernBert
Prerequisites
- [x] I am running the latest code. Mention the version if possible as well.
- [x] I carefully followed the README.md.
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
- Add Support for ModernBert https://huggingface.co/nomic-ai/modernbert-embed-base
Motivation
Use the more enhanced version of embedding models for accurate retrieval.
Possible Implementation
No response
https://huggingface.co/blog/modernbert
This issue was closed because it has been inactive for 14 days since being marked as stale.
https://github.com/NoahBPeterson/llama.cpp/tree/modernbert
I did some work on this, and got to the point where I could produce some non-zero output, but I couldn't get it all of the way there.
If I started over, I would have copied one of the Gemma models' implementations instead of Bert, since ModernBERT uses Gemma's attention, which is global every n layers, local for all the others. That's what I get for not deep reading the ModernBERT paper until well into my efforts of trying to recreate the attention mechanism myself ;)
since ModernBERT uses Gemma's attention
Gemma's attention is causal, while if I remember correctly, ModernBERT uses non-causal. So it would need a slight modification when creating the KQ mask.
This issue was closed because it has been inactive for 14 days since being marked as stale.