TEAL icon indicating copy to clipboard operation
TEAL copied to clipboard

Incomplete implementation of SparseGEMV

Open kyang-06 opened this issue 4 months ago • 3 comments

https://github.com/FasterDecoding/TEAL/blob/fb7373c93ac3594817c9ee64d4e08b47430a1822/kernels/sparse_gemv.py#L271

Hi, I notice that the SparseGEMV kernel only manage the case when batch_size=1 & seqlen=1. Beyond that case, the kernel outputs wrong answer.

Is it expected that this kernel only work for decoding stage? Then where is the implementation about Appendix A4?

kyang-06 avatar Sep 02 '25 03:09 kyang-06

They mentioned in A4 that single-batch setting is used. That said, I don't think it's appropriate to compare 2:4 sparsity here as 2:4 sparsity is not fit for small-batch matmul's.

nil0x9 avatar Sep 10 '25 13:09 nil0x9

I think this method(therefore the kernel) only focus on single-batch setting in decoding stage, as other activation sparsity also only focus on them. The advantage of activation sparsity is latency, not a throughput I think, and Appendix A4 might be focused on latency(as speedup)

quaternior avatar Sep 22 '25 05:09 quaternior

yes its primarily a decoding kernel. A.4 is based on loss evals, where the sparsity (both weight and activation) is simulated

chromecast56 avatar Sep 26 '25 00:09 chromecast56