pyopencl icon indicating copy to clipboard operation
pyopencl copied to clipboard

A program is not built on a CPU device in a heterogeneous context

Open fjarri opened this issue 6 years ago • 3 comments

MacOS 10.15.2, Python 3.7.5, PyOpenCL 2019.1.2

My machine has three available devices:

  • 0: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
  • 1: Intel(R) UHD Graphics 630
  • 2: AMD Radeon Pro Vega 20 Compute Engine

I am running the following code:

import pyopencl as cl

platform = cl.get_platforms()[0]
devices = platform.get_devices()
ctx = cl.Context([devices[0], devices[1]])

src = """
__kernel void sum(
    __global const float *a_g, __global const float *b_g, __global float *res_g)
{
  int gid = get_global_id(0);
  res_g[gid] = a_g[gid] + b_g[gid];
}
"""

prg = cl.Program(ctx, src)
prg = prg.build()
for i in range(len(prg.devices)):
    print("status for device", prg.devices[i], ": ",
        prg.get_build_info(prg.devices[i], cl.program_build_info.STATUS))
print("prg.binaries", prg.binary_sizes)

When using devices 0 and 1 for the context (corresponding to the line above ctx = cl.Context([devices[0], devices[1]])), I get the following output:

status for device <pyopencl.Device 'Intel(R) UHD Graphics 630' on 'Apple' at 0x1024500> :  0
status for device <pyopencl.Device 'Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz' on 'Apple' at 0xffffffff> :  -2
prg.binaries [2227, 0]

That is, the build for the CPU fails (the logs are empty). Same result for devices 0 and 2. For devices 1 and 2 (that is, both GPUs), everything works:

status for device <pyopencl.Device 'Intel(R) UHD Graphics 630' on 'Apple' at 0x1024500> :  0
status for device <pyopencl.Device 'AMD Radeon Pro Vega 20 Compute Engine' on 'Apple' at 0x1021d00> :  0
prg.binaries [2227, 14256]

The following crude OpenCL/C code that (as far as I can tell) attempts to do the same thing, returns success statuses for a context with devices 0 and 1. What can be the reason for the errors in Python?

fjarri avatar Jan 09 '20 07:01 fjarri

The only platform with multiple devices I have access to is a Linux CUDA deployment, everything works there (as in your 2-GPU case). Also, since PyOpenCL doesn't distinguish between GPU and CPU devices, I suspect that this is a driver (ICD) implementation issue, not one with PyOpenCL. You could try disabling the cache (export PYOPENCL_NO_CACHE=1) to see if that helps (e.g. if something is unhappy about binary uploads).

inducer avatar Jan 09 '20 19:01 inducer

Also, since PyOpenCL doesn't distinguish between GPU and CPU devices, I suspect that this is a driver (ICD) implementation issue, not one with PyOpenCL.

That's why I tested the same thing with an OpenCL/C code, and it works there.

You could try disabling the cache (export PYOPENCL_NO_CACHE=1) to see if that helps (e.g. if something is unhappy about binary uploads).

Doesn't help, unfortunately.

fjarri avatar Jan 09 '20 20:01 fjarri

A workaround seems to be to create and build a separate Program object for each device used.

fjarri avatar Jan 14 '20 07:01 fjarri