A program is not built on a CPU device in a heterogeneous context
MacOS 10.15.2, Python 3.7.5, PyOpenCL 2019.1.2
My machine has three available devices:
- 0: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
- 1: Intel(R) UHD Graphics 630
- 2: AMD Radeon Pro Vega 20 Compute Engine
I am running the following code:
import pyopencl as cl
platform = cl.get_platforms()[0]
devices = platform.get_devices()
ctx = cl.Context([devices[0], devices[1]])
src = """
__kernel void sum(
__global const float *a_g, __global const float *b_g, __global float *res_g)
{
int gid = get_global_id(0);
res_g[gid] = a_g[gid] + b_g[gid];
}
"""
prg = cl.Program(ctx, src)
prg = prg.build()
for i in range(len(prg.devices)):
print("status for device", prg.devices[i], ": ",
prg.get_build_info(prg.devices[i], cl.program_build_info.STATUS))
print("prg.binaries", prg.binary_sizes)
When using devices 0 and 1 for the context (corresponding to the line above ctx = cl.Context([devices[0], devices[1]])), I get the following output:
status for device <pyopencl.Device 'Intel(R) UHD Graphics 630' on 'Apple' at 0x1024500> : 0
status for device <pyopencl.Device 'Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz' on 'Apple' at 0xffffffff> : -2
prg.binaries [2227, 0]
That is, the build for the CPU fails (the logs are empty). Same result for devices 0 and 2. For devices 1 and 2 (that is, both GPUs), everything works:
status for device <pyopencl.Device 'Intel(R) UHD Graphics 630' on 'Apple' at 0x1024500> : 0
status for device <pyopencl.Device 'AMD Radeon Pro Vega 20 Compute Engine' on 'Apple' at 0x1021d00> : 0
prg.binaries [2227, 14256]
The following crude OpenCL/C code that (as far as I can tell) attempts to do the same thing, returns success statuses for a context with devices 0 and 1. What can be the reason for the errors in Python?
The only platform with multiple devices I have access to is a Linux CUDA deployment, everything works there (as in your 2-GPU case). Also, since PyOpenCL doesn't distinguish between GPU and CPU devices, I suspect that this is a driver (ICD) implementation issue, not one with PyOpenCL. You could try disabling the cache (export PYOPENCL_NO_CACHE=1) to see if that helps (e.g. if something is unhappy about binary uploads).
Also, since PyOpenCL doesn't distinguish between GPU and CPU devices, I suspect that this is a driver (ICD) implementation issue, not one with PyOpenCL.
That's why I tested the same thing with an OpenCL/C code, and it works there.
You could try disabling the cache (export PYOPENCL_NO_CACHE=1) to see if that helps (e.g. if something is unhappy about binary uploads).
Doesn't help, unfortunately.
A workaround seems to be to create and build a separate Program object for each device used.