cold-compress icon indicating copy to clipboard operation
cold-compress copied to clipboard

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Open griff4692 opened this issue 1 year ago • 0 comments

Implement this paper.

Similar to class KVCacheFastGen in that it involves a profiling step.

griff4692 avatar Jul 10 '24 12:07 griff4692