SnapKV icon indicating copy to clipboard operation
SnapKV copied to clipboard

Question on GQA implementation

Open cyLi-Tiger opened this issue 1 year ago • 1 comments

In GQA, only one copy of kv cache will be saved for each group, but snapKV saves kv cache with num_key_value_heads * num_key_value_groups heads. Indeed in kv cache eviction, the choice might be different for kv cache in the same group, but it increases memory cost by num_key_value_groups. Is there a way we can solve this?

cyLi-Tiger avatar Jun 17 '24 10:06 cyLi-Tiger

Same question

pengshuang avatar Jun 17 '24 11:06 pengshuang