cold-compress icon indicating copy to clipboard operation
cold-compress copied to clipboard

Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.

Results 12 cold-compress issues
Sort by recently updated
recently updated
newest added

Does your codes have function to analyze attention scores, or needs to be observed in the **Transformer** class

Thank you for providing the code to easily test various KV-related algorithms. I have a question regarding evaluation. I compared evaluations through truthfulQA. Accuracy was recorded in "truthfulqa_metrics.json". When the...

Hello, I see SnapKV is used for the Heavy Hitter Prompt Compression strategy. As far as I understand (correct me if I'm wrong), it is also used in the benchmarks...

I'm getting the error below when running any torch code. This is probably due to an incompatible cuda version (requirements.txt specifies cu121). I would suggest to - ether add a...

Other modifications worth mentioning: - Changed `scripts/convert_hf_checkpoint.py` to support loading of finetuned Llama-3 models from safetensors state dict - Added finetuned configs to `model.py` (Finetuned models use a vocab size...

https://arxiv.org/abs/2407.21018

https://arxiv.org/abs/2405.12532

[InfLLM](https://arxiv.org/abs/2402.04617)

Implement this [paper](https://arxiv.org/abs/2407.02490). Similar to `class KVCacheFastGen` in that it involves a profiling step.