Larry Meadows

Results 7 issues of Larry Meadows

1. SYCL performance on NVIDIA A100 is currently 2-3% worse than native CUDA. Inspection of the PTX generated by SYCL shows extra parameters and instructions due to accessor and buffers....

I tried to write a simple tool that I can load with LD_PRELOAD (in the spirit of MPI tools for example, or OpenCL interposers). It seems to initialize OK but...

Signed-off-by: Larry Meadows # Adding a New Sample(s) ## Description Two vector addition samples: tilied vector addition using local accessors, and vector addition demonstrating different uses of conditionals. ## Checklist...

--hip-trace gives the COPY calls but the number of bytes transferred is glaringly missing. Is this in the HSA trace layer? There's a mention of being able to trace specific...

I'm trying to hack my way around it. I just want to build a local copy of the library etc. All the use of cmake envirables is making it difficult....

The code in tblextr.py uses regular expressions that do not account for the universe of clang-produced demangled names. I argue that the right fix for this is to use mangled...

It is very useful to have the mangled names when performing tasks like searching binaries for kernel assembly code. It may be possible to do this with c++filt on the...