cold-compress issues

How to get attention scores

1

Does your codes have function to analyze attention scores, or needs to be observed in the **Transformer** class

Thank you for providing the code to easily test various KV-related algorithms. I have a question regarding evaluation. I compared evaluations through truthfulQA. Accuracy was recorded in "truthfulqa_metrics.json". When the...

freeSoul-SNU

SnapKV

2

Hello, I see SnapKV is used for the Heavy Hitter Prompt Compression strategy. As far as I understand (correct me if I'm wrong), it is also used in the benchmarks...

SimJeg

torch dependency results in error

1

I'm getting the error below when running any torch code. This is probably due to an incompatible cuda version (requirements.txt specifies cu121). I would suggest to - ether add a...

maxjeblick

add gist model generation utils to library

1

Other modifications worth mentioning: - Changed `scripts/convert_hf_checkpoint.py` to support loading of finetuned Llama-3 models from safetensors state dict - Added finetuned configs to `model.py` (Finetuned models use a vocab size...

uSaiPrashanth

cold-compress
cold-compress copied to clipboard

Metadata

How to get attention scores

Question of evaluation

SnapKV

torch dependency results in error

add gist model generation utils to library

Implement ThinK

Implement PyramidInfer

Implement InfLLM

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Added Hyperattention

← Metadata

Owner

Metadata

cold-compress cold-compress copied to clipboard

Metadata

← Metadata

Owner

Metadata

cold-compress
cold-compress copied to clipboard