divchenko

Results 3 issues of divchenko

I'm doing A 4 bit x B fp16 matmul w/ large A and small B. I expect it to beat fp8 matmul (it should be memory-bound). In reality, it seems...

question
? - Needs Triage
inactive-30d
inactive-90d

**Describe the bug** Currently example initializes Q to quite small values (mean -1, stddev 1). If I initialize Q to a bit bigger values (e.g. stddev 100), split-kv stops working....

bug
? - Needs Triage
inactive-30d

I'm trying to get best memory b/w from a B200 nvfp4 grouped (ptr-based) gemm. I'm running example 75 w/ ``` ./75_blackwell_grouped_gemm_block_scaled --m=16 --n=2048 --k=7168 --groups=32 ./75_blackwell_grouped_gemm_block_scaled --m=16 --n=7168 --k=2048 --groups=32...

question
? - Needs Triage
inactive-30d