transformers Documentation of `SinkCache` has bug in example code

System Info

transformers version: 4.44.0
Platform: Linux-6.8.0-40-generic-x86_64-with-glibc2.39
Python version: 3.12.3
Huggingface_hub version: 0.24.5
Safetensors version: 0.4.4
Accelerate version: 0.33.0
Accelerate config: not found
PyTorch version (GPU?): 2.4.0+cu121 (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:

Who can help?

@zucchini-nlp, @gante

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Run the following example code from the doc string of SinkCache (link to doc string; it was added in commit 37c5ca5eb9):

from transformers import AutoTokenizer, AutoModelForCausalLM, SinkCache

model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")

inputs = tokenizer(text="My name is GPT2", return_tensors="pt")

# Prepare a cache class and pass it to model's forward
past_key_values = SinkCache(window_length=256, num_sink_tokens=4)
outputs = model(**inputs, past_key_values=past_key_values, use_cache=True)
past_kv_length = outputs.past_key_values # access cache filled with key/values from generation

Obtain the following error message: TypeError: 'SinkCache' object is not subscriptable

It seems like SinkCache doesn't actually work for the GPT2 model (and so the model expects that past_key_values uses the legacy "list of tuples of tensors" format). An example of a model for which SinkCache works is Locutusque/TinyMistral-248M.

Expected behavior

No error message, since it's an official example code.

Aug 21 '24 13:08 robamler

@robamler Thank you for opening this issue 🤗

@zucchini-nlp all added examples are in fact broken, gpt2 is not compatible with any of the Cache classes 😛 Could you open a PR to a) fix the examples b) make sure we run the examples as doctests in our daily CI?

Aug 21 '24 17:08 gante

Indeed, the model choice was not the best and seems like doctests are not run by CI before merging. WIl check those out

Aug 22 '24 04:08 zucchini-nlp