Tests failing with multi-device targets.
With HL_JIT_TARGET=host-cuda-openglcompute-opencl, the following tests fail:
correctness_float16_t
Error: OpenCL kernel uses half type, but CLHalf target flag not enabled
Running this with HL_JIT_TARGET=host-cuda-openglcompute-opencl-cl_half results in:
Error: CL: clBuildProgram failed: CL_BUILD_PROGRAM_FAILURE
Build Log:
<kernel>:35:26: warning: cannot enable cl_khr_fp_16 extension on this platform - ignoring.
#pragma OPENCL EXTENSION cl_khr_fp16 : enable
^
<kernel>:36:13: error: declaring function return value of type 'half' is not allowed; did you forget * ?
inline half half_from_bits(unsigned short x) {return __builtin_astype(x, half);}
^
<kernel>:37:13: error: declaring function return value of type 'half' is not allowed; did you forget * ?
inline half nan_f16() { return half_from_bits(32767); }
^
<kernel>:38:13: error: declaring function return value of type 'half' is not allowed; did you forget * ?
inline half neg_inf_f16() { return half_from_bits(31744); }
^
<kernel>:39:13: error: declaring function return value of type 'half' is not allowed; did you forget * ?
inline half inf_f16() { return half_from_bits(64512); }
^
<kernel>:40:29: error: declaring function argument of type '__attr
correctness_gpu_assertion_in_kernel
Warning: Ignoring assertion inside OpenCL kernel: (uint1)t7
Warning: Ignoring assertion inside OpenCL kernel: (uint1)t13
CL: halide_opencl_buffer_copy (user_context: 0x0, src: 0x55c729792968, dst: 0x55c729792968)
CL: validate 0x55c72af6f8c0 offset: 0: asked for 0, actual allocated 1200
from device to host, 0x55c72b461aa0 + 0 -> 0x55c72af6f380 + 0, 1200 bytes
Time: 3.144500e-02 ms
CL: halide_opencl_device_free (user_context: 0x0, buf: 0x55c729792968) cl_mem: 0x55c72af6f8c0
CL: validate 0x55c72af6f8c0 offset: 0: asked for 0, actual allocated 1200
clReleaseMemObject 0x55c72af6f8c0
Time: 7.053400e-02 ms
CL: halide_opencl_buffer_copy (user_context: 0x0, src: 0x55c729e8ec28, dst: 0x55c729e8ec28)
CL: validate 0x55c72af6f8c0 offset: 0: asked for 0, actual allocated 1600
from device to host, 0x55c72af76210 + 0 -> 0x55c72af70f80 + 0, 1600 bytes
Time: 1.695700e-02 ms
CL: halide_opencl_device_free (user_context: 0x0, buf: 0x55c729e8ec28) cl_mem: 0x55c72af6f8c0
CL: validate 0x55c72af6f8c0 offset: 0: asked for 0, actual allocated 1600
clReleaseMemObject 0x55c72af6f8c0
Time: 5.798500e-02 ms
There was supposed to be an error
correctness_math
host is: x86-64-linux-avx-avx2-avx512-avx512_skylake-f16c-fma-sse41
HL_JIT_TARGET is: x86-64-linux-avx-avx2-avx512-avx512_skylake-cuda-cuda_capability_61-f16c-fma-jit-opencl-openglcompute-sse41
Testing abs(float)
Error: In schedule for test_abs, can't create var xi using a split or tile, because xi is already used in this Func's schedule elsewhere.
Vars: x.xi x.x __outermost
correctness_register_shuffle
Error: The OpenCL backend does not support the gpu_lanes() scheduling directive.
performance_gpu_half_throughput
Error: OpenCL kernel uses half type, but CLHalf target flag not enabled
Adding cl_half to the target results in:
<kernel>:35:26: warning: cannot enable cl_khr_fp_16 extension on this platform - ignoring.
#pragma OPENCL EXTENSION cl_khr_fp16 : enable
^
<kernel>:36:13: error: declaring function return value of type 'half' is not allowed; did you forget * ?
inline half half_from_bits(unsigned short x) {return __builtin_astype(x, half);}
^
<kernel>:37:13: error: declaring function return value of type 'half' is not allowed; did you forget * ?
inline half nan_f16() { return half_from_bits(32767); }
^
<kernel>:38:13: error: declaring function return value of type 'half' is not allowed; did you forget * ?
inline half neg_inf_f16() { return half_from_bits(31744); }
^
<kernel>:39:13: error: declaring function return value of type 'half' is not allowed; did you forget * ?
inline half inf_f16() { return half_from_bits(64512); }
^
<kernel>:40:29: error: declaring function argument of type '__attr
None of the tests fail when only cuda is included, but including it skips some checks that cause the tests to finish early (this applies to correctness_gpu_assertion_in_kernel, correctness_register_shuffle, performance_gpu_half_throughput)
Some failures are quite strange. See the following interaction:
alex@alex-ubuntu:~/Development/Halide/cmake-build-release$ HL_JIT_TARGET=host ./test/correctness/correctness_float16_t
Success!
alex@alex-ubuntu:~/Development/Halide/cmake-build-release$ HL_JIT_TARGET=host-cuda ./test/correctness/correctness_float16_t
Success!
alex@alex-ubuntu:~/Development/Halide/cmake-build-release$ HL_JIT_TARGET=host-opencl ./test/correctness/correctness_float16_t
Success!
alex@alex-ubuntu:~/Development/Halide/cmake-build-release$ HL_JIT_TARGET=host-opencl-cl_half ./test/correctness/correctness_float16_t
Success!
alex@alex-ubuntu:~/Development/Halide/cmake-build-release$ HL_JIT_TARGET=host-opencl-cl_half-cuda ./test/correctness/correctness_float16_t
Error: CL: clBuildProgram failed: CL_BUILD_PROGRAM_FAILURE
Build Log:
<kernel>:35:26: warning: cannot enable cl_khr_fp_16 extension on this platform - ignoring.
#pragma OPENCL EXTENSION cl_khr_fp16 : enable
^
<kernel>:36:13: error: declaring function return value of type 'half' is not allowed; did you forget * ?
inline half half_from_bits(unsigned short x) {return __builtin_astype(x, half);}
^
<kernel>:37:13: error: declaring function return value of type 'half' is not allowed; did you forget * ?
inline half nan_f16() { return half_from_bits(32767); }
^
<kernel>:38:13: error: declaring function return value of type 'half' is not allowed; did you forget * ?
inline half neg_inf_f16() { return half_from_bits(31744); }
^
<kernel>:39:13: error: declaring function return value of type 'half' is not allowed; did you forget * ?
inline half inf_f16() { return half_from_bits(64512); }
^
<kernel>:40:29: error: declaring function argument of type '__attr
Aborted (core dumped)
alex@alex-ubuntu:~/Development/Halide/cmake-build-release$ HL_JIT_TARGET=host-opencl-cl_half-openglcompute ./test/correctness/correctness_float16_t
Success!
This is interesting to note, but we probably don't want/need to run all the tests this way. There's exactly one I know of (gpu_multi_device) that expects to be run this way currently.
Hey,
is there a fix for this?
Currently experiencing exactly the same when trying to run half float OpenCL on Arm Mali (arm-64-android-opencl-cl_half):
I halide : Error: CL: clBuildProgram failed: CL_BUILD_PROGRAM_FAILURE
I halide : Build Log:
I halide : <source>:34:26: error: '#' is not followed by a macro parameter
I halide : #define halide_unused(x)#pragma OPENCL EXTENSION cl_khr_fp16 : enable
I halide : ^
I halide :
I halide : <source>:34:33: error: unknown type name 'OPENCL'
I halide : #define halide_unused(x)#pragma OPENCL EXTENSION cl_khr_fp16 : enable
I halide : ^
I halide :
I halide : <source>:34:49: error: expected ';' after top level declarator
I halide : #define halide_unused(x)#pragma OPENCL EXTENSION cl_khr_fp16 : enable
I halide : ^
I halide :
I halide : error: Compiler frontend failed (error code 60)
If you are building your own Halide, can you try pulling this branch: https://github.com/halide/Halide/tree/gpu_context_consistency and see if it fixes it?
EDIT: Actually, seems doubtful as these look like compiler errors, but if there are multiple contexts involved, it might be the issue.
The issue l-oneil raised was fixed in this commit: https://github.com/halide/Halide/commit/e53d04e38c4a52ad18d12b54e019b7075d578d72#diff-6f4ec45f85e77617c9347eb7c6148ab56faf6a97ef6d7a193a4b20ce54261078 . Around September 16th. (Not sure if that is the entirety of this bug.)
Thank you very much! Re-reading the error I see this was unrelated and was the missing newline operator. Suspect this will fix the issue, working on the latest commit now 👍