functionstackx

Results 29 comments of


                                            functionstackx

cuSPARSELt matmul example not working on M=N=K8192

hi fbusato, thanks for your suggestion. 1. I believe i am already running the autotuning function `cusparseLtMatmulSearch`. is there another function that I am missing? https://github.com/OrenLeung/CUDALibrarySamples/blob/e3cfb07e6b6625ec33b8526d82bebd5a21185624/cuSPARSELt/matmul/matmul_example.cpp#L348 2. i have already...

cuSPARSELt matmul example not working on M=N=K8192

It seems when changing the inputs to a normal distribution centered around 0, then the sparse performance gets a bit better with 20% improvement over dense. https://github.com/OrenLeung/CUDALibrarySamples/commit/9cabba4b1154f2c49037d89171d41c31b6033c79 ``` # median...

cuSPARSELt matmul example not working on M=N=K8192

@fbusato thanks for running it. by "800W h100", you mean 700W right? we also see around 1.20-1.22x improvement too. Would you have any suggestions on shapes where sparsity would show...

[Feature]: Parity With NVIDIA DCGM - Pulse Test

@hliuca by internally, do u mean in an closed source package? is there any way to gain access to that or is there any way that it could be open...

remove skipifrocm from composability tests

@pytorchbot label ciflow/rocm

remove skipifrocm from composability tests

@jeffdaily all of the failures seem unrelated

going forward use community builds upstream ROCm vllm/sglang images instead of AMD's fork images

@qcolombet @araslanix

Perfetto SGLang profiler trace during CI to make it easier to improve perf

https://github.com/sgl-project/sglang/blob/ddd1440d0f027e85af6be53bbb309683ed7ea2c4/.github/workflows/nightly-test.yml#L49-L64

Use vLLM framework for DeepSeek R1 on MI325 and MI355 hardware

@qcolombet yes, we are looking into it @cquil11 is just trying to land an massive refactor PR first to reduce tech debt and then we can look into this one

Use vLLM framework for DeepSeek R1 on MI325 and MI355 hardware

@merrymercy

‹
1
2
3
›