rebased issues

Swapping attention in a pretrained model for inference

Consider we have a LLM, which had been pretrained with quadratic attention, and we want to extend its context size/improve performance. And for this purpose we only swap the attention...

kabachuha

Comparison with updated Based

2

Based architecture seems to have been updated - https://arxiv.org/abs/2402.18668. Any insights into how it compares with Rebased?

obv-mikhail

Could you provide the pretrained weights for testing?

2

Hello! The concept is awesome, and it would be nice to integrate it into the huggingface/transformers library. However, to ensure that everything works correctly and matches the paper results, we...

kabachuha

Lack of clarity about sim function vs feature map for paper/code

5

Hi, I read your paper and found the following confusing. When you're describing your ablations which culminate in ReBased it starts with > x^2 – substituting the original kernel function...

deklanw

rebased
rebased copied to clipboard

Metadata

Swapping attention in a pretrained model for inference

Comparison with updated Based

Could you provide the pretrained weights for testing?

Lack of clarity about sim function vs feature map for paper/code

← Metadata

Owner

Metadata

rebased rebased copied to clipboard

Metadata

Swapping attention in a pretrained model for inference

Comparison with updated Based

Could you provide the pretrained weights for testing?

Lack of clarity about sim function vs feature map for paper/code

← Metadata

Owner

Metadata

rebased
rebased copied to clipboard