rebased
rebased copied to clipboard
Official implementation of the paper "Linear Transformers with Learnable Kernel Functions are Better In-Context Models"
Consider we have a LLM, which had been pretrained with quadratic attention, and we want to extend its context size/improve performance. And for this purpose we only swap the attention...
Based architecture seems to have been updated - https://arxiv.org/abs/2402.18668. Any insights into how it compares with Rebased?
Hello! The concept is awesome, and it would be nice to integrate it into the huggingface/transformers library. However, to ensure that everything works correctly and matches the paper results, we...
Hi, I read your paper and found the following confusing. When you're describing your ablations which culminate in ReBased it starts with > x^2 – substituting the original kernel function...