fairseq
fairseq copied to clipboard
KV Cache in the transformers
❓ Questions and Help
What is your question?
I'm a beginner, so please excuse my rather naive question: Does fairseq's implementation of a TransformerDecoder have a KV cache? Is the incremental_state the kv cache?
What's your environment?
- fairseq Version (e.g., 1.0 or main): main
- PyTorch Version (e.g., 1.0): 2.1.0
- OS (e.g., Linux): Linux
- How you installed fairseq (
pip, source): Source - Python version: 3.10.6
- GPU models and configuration: the MoE model [for language modeling] and the dense decoder only model
It seems yes. Please check this: https://github.com/facebookresearch/fairseq/blob/920a548ca770fb1a951f7f4289b4d3a0c1bc226f/fairseq/model_parallel/modules/multihead_attention.py#L128