InfiniTransformer icon indicating copy to clipboard operation
InfiniTransformer copied to clipboard

Unofficial PyTorch/🀗Transformers(Gemma/Llama3) implementation of Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Results 12 InfiniTransformer issues
Sort by recently updated
recently updated
newest added

I noticed that the memory retrieval and update happens before 'apply_rotary_pos_emb'. Wondering whether the memory lacking location information would confuse the model's perception of the order of historical information?

First of all, thank you very much for your work. I try to train the model `Gemma-2B 32K seq len with 2K segment size` on a single A6000Ada 48G But...

Just wondering if any limitations of the Infini-attention like inference speed and model performance. Not too much discussions in the paper.

If someone has trained these, could they share them?

By saving the model and reloading it I managed to get the model working, both with quantized and full precision (it still uses 10gb max of gpu ram). However, the...

Very exciting work, when will it be made public to help researchers explore it more deeply?

What are your thoughts about adding bitlinear?

Did anyone face this issue? warnings.warn( Traceback (most recent call last): File "test_train.small.gemma.infini.py", line 150, in trainer.train() File "/transformers/src/transformers/trainer.py", line 1885, in train return inner_training_loop( File "/transformers/src/transformers/trainer.py", line 2216, in...

I used **accelerate launch** with **ZERO-3** to run train.llama.infini.noclm.1Mseq.sh. But I got this: **RuntimeError: Function 'LinearFunctionForZeroStage3Backward' returned nan values in its 0th output**

Hi! I trained the model with LoRA and 8 bit precision down to 1.5/2.5 training loss. The generation is segment-wise, but the model seems to not generate correct text. It...