Wen-Heng (Jack) Chung
Wen-Heng (Jack) Chung
When used with `-fno-gpu-rdc`, this allows applications to inject ISA into object files. This flag is used for kernel developers to tweak ISA before an optimization can be devised in...
Amend `clang-ocl` so it could spit LLVM IR instead of HSA Code Object. Paves way for driving MIOpen from MLIR.
`rocprof` depends on `rocminfo` to be executed properly, but such dependency is not specified in the debian package. Source code where the dependency to `rocminfo` takes place: https://github.com/ROCm-Developer-Tools/rocprofiler/blob/amd-master/bin/rpl_run.sh#L205 Debian package...
only convolution layers are changed in this PR. other layers which use `MIOpen` aren't changed yet.
## Modifications Use `tl.range()` in block GEMM kernels with `num_stages` set by host to hint Triton produce better software pipelining. ## Checklist - [X] Format your code according to the...
## Modifications Add additional block quant GEMM tuning configs for AMD GPUs. ## Checklist - [X] Format your code according to the [Code Formatting with Pre-Commit](https://docs.sglang.ai/references/contribution_guide.html#code-formatting-with-pre-commit).