InfiniTransformer issues

About memory missing location information

6

I noticed that the memory retrieval and update happens before 'apply_rotary_pos_emb'. Wondering whether the memory lacking location information would confuse the model's perception of the order of historical information?

LzhinFdu

What is the min GPU memory required to fine-tune the model?

First of all, thank you very much for your work. I try to train the model `Gemma-2B 32K seq len with 2K segment size` on a single A6000Ada 48G But...

Ozawa333

Limitations of the method

2

Just wondering if any limitations of the Infini-attention like inference speed and model performance. Not too much discussions in the paper.

fyang064

Are there any trained InfinityTransformer weights available?

1

If someone has trained these, could they share them?

PasiKoodaa

Model generating random sequence

8

By saving the model and reloading it I managed to get the model working, both with quantized and full precision (it still uses 10gb max of gpu ram). However, the...

Lazy3valuation

When will the code be made public, please?

3

Very exciting work, when will it be made public to help researchers explore it more deeply?

zzr-idam

BitLinear

What are your thoughts about adding bitlinear?

DewEfresh

Issue while runing test_train.small.gemma.infini.py

2

Did anyone face this issue? warnings.warn( Traceback (most recent call last): File "test_train.small.gemma.infini.py", line 150, in trainer.train() File "/transformers/src/transformers/trainer.py", line 1885, in train return inner_training_loop( File "/transformers/src/transformers/trainer.py", line 2216, in...

raghavgarg97

Support Zero-3?

1

I used **accelerate launch** with **ZERO-3** to run train.llama.infini.noclm.1Mseq.sh. But I got this: **RuntimeError: Function 'LinearFunctionForZeroStage3Backward' returned nan values in its 0th output**

WF0511

Model loses information very quickly

2

Hi! I trained the model with LoRA and 8 bit precision down to 1.5/2.5 training loss. The generation is segment-wise, but the model seems to not generate correct text. It...

Lazy3valuation

InfiniTransformer
InfiniTransformer copied to clipboard

Metadata

About memory missing location information

What is the min GPU memory required to fine-tune the model?

Limitations of the method

Are there any trained InfinityTransformer weights available?

Model generating random sequence

When will the code be made public, please?

BitLinear

Issue while runing test_train.small.gemma.infini.py

Support Zero-3?

Model loses information very quickly

← Metadata

Owner

Metadata

InfiniTransformer InfiniTransformer copied to clipboard

Metadata

← Metadata

Owner

Metadata

InfiniTransformer
InfiniTransformer copied to clipboard