Wen-Heng (Jack) Chung

Results 6 issues of Wen-Heng (Jack) Chung

When used with `-fno-gpu-rdc`, this allows applications to inject ISA into object files. This flag is used for kernel developers to tweak ISA before an optimization can be devised in...

Amend `clang-ocl` so it could spit LLVM IR instead of HSA Code Object. Paves way for driving MIOpen from MLIR.

`rocprof` depends on `rocminfo` to be executed properly, but such dependency is not specified in the debian package. Source code where the dependency to `rocminfo` takes place: https://github.com/ROCm-Developer-Tools/rocprofiler/blob/amd-master/bin/rpl_run.sh#L205 Debian package...

only convolution layers are changed in this PR. other layers which use `MIOpen` aren't changed yet.

## Modifications Use `tl.range()` in block GEMM kernels with `num_stages` set by host to hint Triton produce better software pipelining. ## Checklist - [X] Format your code according to the...

## Modifications Add additional block quant GEMM tuning configs for AMD GPUs. ## Checklist - [X] Format your code according to the [Code Formatting with Pre-Commit](https://docs.sglang.ai/references/contribution_guide.html#code-formatting-with-pre-commit).