Alex Yang

Results 9 comments of Alex Yang

i have repro, there appears to be OOM in this test. will continue looking to give a more concrete root cause notes: baseline ``` pytest tests/moe/test_trtllm_gen_fused_moe.py -k SwiGlu-NoShuffle_MajorK-DSv3-FP8_Block-1024-1024 -v tests/moe/test_trtllm_gen_fused_moe.py::test_moe_quantization_classes[SwiGlu-NoShuffle_MajorK-DSv3-FP8_Block-1024-1024-1]...

also observed this as well same as the description ``` /repos/flashinfer/csrc/trtllm_fused_moe_dev_kernel.cu:178 executing 'cudaLaunchKernelEx(&config, kernelTyped, params)': inv ```

OOM may be just distraction from my compute-sanitizer usage ``` gmon pytest tests/moe/test_trtllm_gen_fused_moe.py::test_moe_quantization_classes[SwiGlu-NoShuffle_MajorK-DSv3-FP8_Block-1024-1024-65536] Peak Memory: 19745.0 MiB ``` will look into trtllm_fused_moe_dev_kernel.cu:178. this is swiglu activation

``` if (data.mUseDeepSeekFp8) { int const numThreads = 128; const dim3 grid(data.innerDim / 128, data.topK, data.numTokens); std::cout

root cause: grid.z is out of bound `num_tokens=64*1024-16` appears to work for me and may be the problem size ceiling due to how activationDeepSeekKernel is launched. we can probably add...

This is functional. https://github.com/flashinfer-ai/flashinfer/pull/2171 Raising it as a proposed solution for what we needed when upgrading to nvidia-cutlass-dsl 4.3.1 https://github.com/NVIDIA/cutlass/issues/2845 Kind regards from FlashInfer & cuDNN :)

sounds good will discuss with you over slack. will learn about the new kernel example and bring action item back to FI