Halide Tests failing with multi-device targets.

With HL_JIT_TARGET=host-cuda-openglcompute-opencl, the following tests fail:

correctness_float16_t

Error: OpenCL kernel uses half type, but CLHalf target flag not enabled

Running this with HL_JIT_TARGET=host-cuda-openglcompute-opencl-cl_half results in:

Error: CL: clBuildProgram failed: CL_BUILD_PROGRAM_FAILURE
Build Log:
<kernel>:35:26: warning: cannot enable cl_khr_fp_16 extension on this platform - ignoring.
#pragma OPENCL EXTENSION cl_khr_fp16 : enable
                         ^
<kernel>:36:13: error: declaring function return value of type 'half' is not allowed; did you forget * ?
inline half half_from_bits(unsigned short x) {return __builtin_astype(x, half);}
            ^
<kernel>:37:13: error: declaring function return value of type 'half' is not allowed; did you forget * ?
inline half nan_f16() { return half_from_bits(32767); }
            ^
<kernel>:38:13: error: declaring function return value of type 'half' is not allowed; did you forget * ?
inline half neg_inf_f16() { return half_from_bits(31744); }
            ^
<kernel>:39:13: error: declaring function return value of type 'half' is not allowed; did you forget * ?
inline half inf_f16() { return half_from_bits(64512); }
            ^
<kernel>:40:29: error: declaring function argument of type '__attr

correctness_gpu_assertion_in_kernel

Warning: Ignoring assertion inside OpenCL kernel: (uint1)t7
Warning: Ignoring assertion inside OpenCL kernel: (uint1)t13
CL: halide_opencl_buffer_copy (user_context: 0x0, src: 0x55c729792968, dst: 0x55c729792968)
CL: validate 0x55c72af6f8c0 offset: 0: asked for 0, actual allocated 1200
    from device to host, 0x55c72b461aa0 + 0 -> 0x55c72af6f380 + 0, 1200 bytes
    Time: 3.144500e-02 ms
CL: halide_opencl_device_free (user_context: 0x0, buf: 0x55c729792968) cl_mem: 0x55c72af6f8c0
CL: validate 0x55c72af6f8c0 offset: 0: asked for 0, actual allocated 1200
    clReleaseMemObject 0x55c72af6f8c0
    Time: 7.053400e-02 ms
CL: halide_opencl_buffer_copy (user_context: 0x0, src: 0x55c729e8ec28, dst: 0x55c729e8ec28)
CL: validate 0x55c72af6f8c0 offset: 0: asked for 0, actual allocated 1600
    from device to host, 0x55c72af76210 + 0 -> 0x55c72af70f80 + 0, 1600 bytes
    Time: 1.695700e-02 ms
CL: halide_opencl_device_free (user_context: 0x0, buf: 0x55c729e8ec28) cl_mem: 0x55c72af6f8c0
CL: validate 0x55c72af6f8c0 offset: 0: asked for 0, actual allocated 1600
    clReleaseMemObject 0x55c72af6f8c0
    Time: 5.798500e-02 ms
There was supposed to be an error

correctness_math

host is:      x86-64-linux-avx-avx2-avx512-avx512_skylake-f16c-fma-sse41
HL_JIT_TARGET is: x86-64-linux-avx-avx2-avx512-avx512_skylake-cuda-cuda_capability_61-f16c-fma-jit-opencl-openglcompute-sse41
Testing abs(float)
Error: In schedule for test_abs, can't create var xi using a split or tile, because xi is already used in this Func's schedule elsewhere.
Vars: x.xi x.x __outermost

correctness_register_shuffle

Error: The OpenCL backend does not support the gpu_lanes() scheduling directive.

performance_gpu_half_throughput

Error: OpenCL kernel uses half type, but CLHalf target flag not enabled

Adding cl_half to the target results in:

<kernel>:35:26: warning: cannot enable cl_khr_fp_16 extension on this platform - ignoring.
#pragma OPENCL EXTENSION cl_khr_fp16 : enable
                         ^
<kernel>:36:13: error: declaring function return value of type 'half' is not allowed; did you forget * ?
inline half half_from_bits(unsigned short x) {return __builtin_astype(x, half);}
            ^
<kernel>:37:13: error: declaring function return value of type 'half' is not allowed; did you forget * ?
inline half nan_f16() { return half_from_bits(32767); }
            ^
<kernel>:38:13: error: declaring function return value of type 'half' is not allowed; did you forget * ?
inline half neg_inf_f16() { return half_from_bits(31744); }
            ^
<kernel>:39:13: error: declaring function return value of type 'half' is not allowed; did you forget * ?
inline half inf_f16() { return half_from_bits(64512); }
            ^
<kernel>:40:29: error: declaring function argument of type '__attr

None of the tests fail when only cuda is included, but including it skips some checks that cause the tests to finish early (this applies to correctness_gpu_assertion_in_kernel, correctness_register_shuffle, performance_gpu_half_throughput)

May 21 '20 02:05 alexreinking

Some failures are quite strange. See the following interaction:

alex@alex-ubuntu:~/Development/Halide/cmake-build-release$ HL_JIT_TARGET=host ./test/correctness/correctness_float16_t
Success!
alex@alex-ubuntu:~/Development/Halide/cmake-build-release$ HL_JIT_TARGET=host-cuda ./test/correctness/correctness_float16_t
Success!
alex@alex-ubuntu:~/Development/Halide/cmake-build-release$ HL_JIT_TARGET=host-opencl ./test/correctness/correctness_float16_t
Success!
alex@alex-ubuntu:~/Development/Halide/cmake-build-release$ HL_JIT_TARGET=host-opencl-cl_half ./test/correctness/correctness_float16_t
Success!
alex@alex-ubuntu:~/Development/Halide/cmake-build-release$ HL_JIT_TARGET=host-opencl-cl_half-cuda ./test/correctness/correctness_float16_t
Error: CL: clBuildProgram failed: CL_BUILD_PROGRAM_FAILURE
Build Log:
<kernel>:35:26: warning: cannot enable cl_khr_fp_16 extension on this platform - ignoring.
#pragma OPENCL EXTENSION cl_khr_fp16 : enable
                         ^
<kernel>:36:13: error: declaring function return value of type 'half' is not allowed; did you forget * ?
inline half half_from_bits(unsigned short x) {return __builtin_astype(x, half);}
            ^
<kernel>:37:13: error: declaring function return value of type 'half' is not allowed; did you forget * ?
inline half nan_f16() { return half_from_bits(32767); }
            ^
<kernel>:38:13: error: declaring function return value of type 'half' is not allowed; did you forget * ?
inline half neg_inf_f16() { return half_from_bits(31744); }
            ^
<kernel>:39:13: error: declaring function return value of type 'half' is not allowed; did you forget * ?
inline half inf_f16() { return half_from_bits(64512); }
            ^
<kernel>:40:29: error: declaring function argument of type '__attr
Aborted (core dumped)
alex@alex-ubuntu:~/Development/Halide/cmake-build-release$ HL_JIT_TARGET=host-opencl-cl_half-openglcompute ./test/correctness/correctness_float16_t
Success!

May 21 '20 02:05 alexreinking

This is interesting to note, but we probably don't want/need to run all the tests this way. There's exactly one I know of (gpu_multi_device) that expects to be run this way currently.

May 21 '20 17:05 steven-johnson

Hey,

is there a fix for this?

Currently experiencing exactly the same when trying to run half float OpenCL on Arm Mali (arm-64-android-opencl-cl_half):

I halide  : Error: CL: clBuildProgram failed: CL_BUILD_PROGRAM_FAILURE
I halide  : Build Log:
I halide  : <source>:34:26: error: '#' is not followed by a macro parameter
I halide  : #define halide_unused(x)#pragma OPENCL EXTENSION cl_khr_fp16 : enable
I halide  :                          ^
I halide  : 
I halide  : <source>:34:33: error: unknown type name 'OPENCL'
I halide  : #define halide_unused(x)#pragma OPENCL EXTENSION cl_khr_fp16 : enable
I halide  :                                 ^
I halide  : 
I halide  : <source>:34:49: error: expected ';' after top level declarator
I halide  : #define halide_unused(x)#pragma OPENCL EXTENSION cl_khr_fp16 : enable
I halide  :                                                 ^
I halide  : 
I halide  : error: Compiler frontend failed (error code 60)

Nov 25 '20 19:11 l-oneil

If you are building your own Halide, can you try pulling this branch: https://github.com/halide/Halide/tree/gpu_context_consistency and see if it fixes it?

EDIT: Actually, seems doubtful as these look like compiler errors, but if there are multiple contexts involved, it might be the issue.

Nov 25 '20 20:11 zvookin

The issue l-oneil raised was fixed in this commit: https://github.com/halide/Halide/commit/e53d04e38c4a52ad18d12b54e019b7075d578d72#diff-6f4ec45f85e77617c9347eb7c6148ab56faf6a97ef6d7a193a4b20ce54261078 . Around September 16th. (Not sure if that is the entirety of this bug.)

Nov 25 '20 20:11 zvookin

Thank you very much! Re-reading the error I see this was unrelated and was the missing newline operator. Suspect this will fix the issue, working on the latest commit now 👍

Nov 25 '20 21:11 l-oneil