Nathan
Nathan
Is there a way to contribute to this?
## Transformers [Added logs to look into the outputs after attention and after mlp_output in transformers](https://github.com/huggingface/transformers/blob/main/src/transformers/models/modernbert/modular_modernbert.py#L739-L754) Using the example above, I get the following: ``` Encoder before attention, 0 -...
I'm going to look into why there are differences here. I might not understand why but will step through both libraries. 1. We know they're operating on similar `hidden_states` before...
# W_o projection output during forward pass I'm noticing a larger difference here 🤔 ## Transformers ``` tensor([[[ 0.1955, 0.5709, 0.2585, ..., -0.2987, 0.2474, 0.3109], [ 0.0070, 0.2721, -0.4248, ...,...
# Separate example Going to look at a different example because it looks like with more than 3 texts it changes things quite a bit. ## Example Using a different...
@Narsil is there any appetite for looking into this? Otherwise I can try to dig in further!
Interestingly, running this on the CPU yields this: ``` [{'index': 1, 'score': 0.99749833}, {'index': 3, 'score': 0.9912548}, {'index': 0, 'score': 0.010130412}, {'index': 2, 'score': 0.0005193049}] ``` It's worth investigating the...
Hey @kozistr! Thanks for the reply the above! > which TEI uses approximated gelu while fusing layers on the backside (I might be wrong). It does look like it's fusing:...