TEAL Incomplete implementation of SparseGEMV

https://github.com/FasterDecoding/TEAL/blob/fb7373c93ac3594817c9ee64d4e08b47430a1822/kernels/sparse_gemv.py#L271

Hi, I notice that the SparseGEMV kernel only manage the case when batch_size=1 & seqlen=1. Beyond that case, the kernel outputs wrong answer.

Is it expected that this kernel only work for decoding stage? Then where is the implementation about Appendix A4?

Sep 02 '25 03:09 kyang-06

They mentioned in A4 that single-batch setting is used. That said, I don't think it's appropriate to compare 2:4 sparsity here as 2:4 sparsity is not fit for small-batch matmul's.

Sep 10 '25 13:09 nil0x9

I think this method(therefore the kernel) only focus on single-batch setting in decoding stage, as other activation sparsity also only focus on them. The advantage of activation sparsity is latency, not a throughput I think, and Appendix A4 might be focused on latency(as speedup)

Sep 22 '25 05:09 quaternior

yes its primarily a decoding kernel. A.4 is based on loss evals, where the sparsity (both weight and activation) is simulated

Sep 26 '25 00:09 chromecast56