Updates for UR API with LevelZero backend for MKL
With MKL version 20250001 on Aurora, it looks like the Level Zero backend needs all devices to be in the context for the MKLshim and interop with MKL layer to work.
It will run regular parallel_for kernels fine, but I see this error for calling hipBlas functions:
MKL Warning: Incompatible OpenCL driver version. GPU performance may be reduced.
Caught synchronous SYCL exception during GEMM:
Level-Zero error:700000041879048196
On device: 'Intel(R) Data Center GPU Max 1550'
in kernel: oneapi::mkl::blas::sgemm_incopy
From looking at the trace, the MKL module is failing at build time from:
01:29:51.286082254 - x4204c0s1b0n0 - vpid: 177150, vtid: 177150 - lttng_ust_ze_build:log: { buildLog: "error: Kernel compiled with required subgroup size 8, which is unsupported on this platform in kernel: 'sgemm_incopy'error: backend compiler failed build.\n" }
If I try with 2025.1 SDK, I get:
MKL Warning: Incompatible OpenCL driver version. GPU performance may be reduced.
Caught synchronous SYCL exception during GEMM:
Not all devices are associated with the context or vector of devices is empty
OpenCL status: 8
Because of this error message, I made a change to send all available devices sycl::device::get_devices() to the sycl_context creation instead of the device object we got from make_device. It seems that MKL now needs the context to have all GPUs available in it, possibly since it is querying all the GPUs via OpenCL and comparing the the queue givent ot it via LZ. Note that the make_queue call still uses just one sycl device, so we still only use one device. This change worked for me, and I confirmed that it called sgemm on the GPU indeed.
This also changes make_queue to use keepOwnership=true, since if we create a second queue from the same context, it will delete the previous queue. (we saw this with https://github.com/CHIP-SPV/H4I-MKLShim/pull/27)
If there's a better solution, we can use that instead.