Alex Yang comments

Results 9 comments of


                                            Alex Yang

[bug] [trtllm_fp8_block_scale_moe] cuda error

i have repro, there appears to be OOM in this test. will continue looking to give a more concrete root cause notes: baseline ``` pytest tests/moe/test_trtllm_gen_fused_moe.py -k SwiGlu-NoShuffle_MajorK-DSv3-FP8_Block-1024-1024 -v tests/moe/test_trtllm_gen_fused_moe.py::test_moe_quantization_classes[SwiGlu-NoShuffle_MajorK-DSv3-FP8_Block-1024-1024-1]...

[bug] [trtllm_fp8_block_scale_moe] cuda error

also observed this as well same as the description ``` /repos/flashinfer/csrc/trtllm_fused_moe_dev_kernel.cu:178 executing 'cudaLaunchKernelEx(&config, kernelTyped, params)': inv ```

[bug] [trtllm_fp8_block_scale_moe] cuda error

OOM may be just distraction from my compute-sanitizer usage ``` gmon pytest tests/moe/test_trtllm_gen_fused_moe.py::test_moe_quantization_classes[SwiGlu-NoShuffle_MajorK-DSv3-FP8_Block-1024-1024-65536] Peak Memory: 19745.0 MiB ``` will look into trtllm_fused_moe_dev_kernel.cu:178. this is swiglu activation

[bug] [trtllm_fp8_block_scale_moe] cuda error

``` if (data.mUseDeepSeekFp8) { int const numThreads = 128; const dim3 grid(data.innerDim / 128, data.topK, data.numTokens); std::cout

[bug] [trtllm_fp8_block_scale_moe] cuda error

root cause: grid.z is out of bound `num_tokens=64*1024-16` appears to work for me and may be the problem size ceiling due to how activationDeepSeekKernel is launched. we can probably add...

MoE autotune print a lot failed kernel on SM120

i'll investigate

feat: Add backend='auto' to mm_fp4 and enable autotune for backend='cudnn'

looks good ✅ no other comments from me

Add spin_lock_atom_cas_acquire_wait function

This is functional. https://github.com/flashinfer-ai/flashinfer/pull/2171 Raising it as a proposed solution for what we needed when upgrading to nvidia-cutlass-dsl 4.3.1 https://github.com/NVIDIA/cutlass/issues/2845 Kind regards from FlashInfer & cuDNN :)

Add spin_lock_atom_cas_acquire_wait function

sounds good will discuss with you over slack. will learn about the new kernel example and bring action item back to FI