Documentation of `SinkCache` has bug in example code
System Info
-
transformersversion: 4.44.0 - Platform: Linux-6.8.0-40-generic-x86_64-with-glibc2.39
- Python version: 3.12.3
- Huggingface_hub version: 0.24.5
- Safetensors version: 0.4.4
- Accelerate version: 0.33.0
- Accelerate config: not found
- PyTorch version (GPU?): 2.4.0+cu121 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?:
Who can help?
@zucchini-nlp, @gante
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
- Run the following example code from the doc string of
SinkCache(link to doc string; it was added in commit 37c5ca5eb9):
from transformers import AutoTokenizer, AutoModelForCausalLM, SinkCache
model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
inputs = tokenizer(text="My name is GPT2", return_tensors="pt")
# Prepare a cache class and pass it to model's forward
past_key_values = SinkCache(window_length=256, num_sink_tokens=4)
outputs = model(**inputs, past_key_values=past_key_values, use_cache=True)
past_kv_length = outputs.past_key_values # access cache filled with key/values from generation
- Obtain the following error message:
TypeError: 'SinkCache' object is not subscriptable
It seems like SinkCache doesn't actually work for the GPT2 model (and so the model expects that past_key_values uses the legacy "list of tuples of tensors" format). An example of a model for which SinkCache works is Locutusque/TinyMistral-248M.
Expected behavior
No error message, since it's an official example code.
@robamler Thank you for opening this issue 🤗
@zucchini-nlp all added examples are in fact broken, gpt2 is not compatible with any of the Cache classes 😛 Could you open a PR to a) fix the examples b) make sure we run the examples as doctests in our daily CI?
Indeed, the model choice was not the best and seems like doctests are not run by CI before merging. WIl check those out