QuaRot
QuaRot copied to clipboard
How is perplexity calculated with the KV cache?
I've noticed QuaRot and other KV cache papers include perplexity, but it is unclear to me how a quantized KV cache is used during perplexity calculation. Do you have a detailed writeup of how you calculate ppl with the quantized kv cache? Thanks