[CUDA][E2E] bindless_images/read_norm_types.cpp failed on SYCL Nightly
Describe the bug
bindless_images/read_norm_types.cpp Refer: https://github.com/intel/llvm/actions/runs/8548669532/job/23442092835
2024-04-04T03:48:41.2565096Z FAIL: SYCL :: bindless_images/read_norm_types.cpp (1910 of 1967)
2024-04-04T03:48:41.2566644Z ******************** TEST 'SYCL :: bindless_images/read_norm_types.cpp' FAILED ********************
2024-04-04T03:48:41.2567397Z Exit Code: 1
2024-04-04T03:48:41.2567596Z
2024-04-04T03:48:41.2567761Z Command Output (stdout):
2024-04-04T03:48:41.2568176Z --
2024-04-04T03:48:41.2568489Z # RUN: at line 4
2024-04-04T03:48:41.2571279Z /__w/llvm/llvm/toolchain/bin//clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda /__w/llvm/llvm/llvm/sycl/test-e2e/bindless_images/read_norm_types.cpp -o /__w/llvm/llvm/build-e2e/bindless_images/Output/read_norm_types.cpp.tmp.out
2024-04-04T03:48:41.2575803Z # executed command: /__w/llvm/llvm/toolchain/bin//clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda /__w/llvm/llvm/llvm/sycl/test-e2e/bindless_images/read_norm_types.cpp -o /__w/llvm/llvm/build-e2e/bindless_images/Output/read_norm_types.cpp.tmp.out
2024-04-04T03:48:41.2596880Z # note: command had no output on stdout or stderr
2024-04-04T03:48:41.2597862Z # RUN: at line 5
2024-04-04T03:48:41.2609779Z env SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT=1 ONEAPI_DEVICE_SELECTOR=cuda:gpu /__w/llvm/llvm/build-e2e/bindless_images/Output/read_norm_types.cpp.tmp.out
2024-04-04T03:48:41.2613354Z # executed command: env SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT=1 ONEAPI_DEVICE_SELECTOR=cuda:gpu /__w/llvm/llvm/build-e2e/bindless_images/Output/read_norm_types.cpp.tmp.out
2024-04-04T03:48:41.2615612Z # .---command stderr------------
2024-04-04T03:48:41.2616385Z # | Some test cases failed
2024-04-04T03:48:41.2617158Z # `-----------------------------
2024-04-04T03:48:41.2617953Z # error: command failed with exit status: 1
2024-04-04T03:48:41.2618584Z
2024-04-04T03:48:41.2618842Z --
To reproduce
DPC++ version: fe44bad6c658165a82953f86ce1f7f70529f2c85
Environment
sycl-ls --verbose output:
2024-04-04T03:22:44.5499901Z ur_print: Images are not fully supported by the CUDA BE, their support is disabled by default. Their partial support can be activated by setting SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT environment variable at runtime.
2024-04-04T03:22:44.5501510Z [cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3090 8.6 [CUDA 12.1]
2024-04-04T03:22:44.5502289Z
2024-04-04T03:22:44.5502372Z Platforms: 1
2024-04-04T03:22:44.5502585Z Platform [#1]:
2024-04-04T03:22:44.5503010Z Version : CUDA 12.1
2024-04-04T03:22:44.5503546Z Name : NVIDIA CUDA BACKEND
2024-04-04T03:22:44.5504233Z Vendor : NVIDIA Corporation
2024-04-04T03:22:44.5504845Z Devices : 1
2024-04-04T03:22:44.5505458Z Device [#0]:
2024-04-04T03:22:44.5506177Z Type : gpu
2024-04-04T03:22:44.5506842Z Version : 8.6
2024-04-04T03:22:44.5507577Z Name : NVIDIA GeForce RTX 3090
2024-04-04T03:22:44.5508469Z Vendor : NVIDIA Corporation
2024-04-04T03:22:44.5509292Z Driver : CUDA 12.1
2024-04-04T03:22:44.5517816Z Aspects : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_native_assert ext_oneapi_bfloat16_math_functions ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_width ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_interop_memory_import ext_oneapi_interop_semaphore_import ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_oneapi_mipmap_level_reference ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_graph ext_oneapi_limited_graph
2024-04-04T03:22:44.5536144Z info::device::sub_group_sizes: 32
2024-04-04T03:22:44.5537445Z default_selector() : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3090 8.6 [CUDA 12.1]
2024-04-04T03:22:44.5539394Z accelerator_selector() : No device of requested type available. -1 (PI_ERRO...
2024-04-04T03:22:44.5541078Z cpu_selector() : No device of requested type available. -1 (PI_ERRO...
2024-04-04T03:22:44.5542629Z gpu_selector() : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3090 8.6 [CUDA 12.1]
2024-04-04T03:22:44.5544340Z custom_selector(gpu) : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3090 8.6 [CUDA 12.1]
2024-04-04T03:22:44.5545985Z custom_selector(cpu) : No device of requested type available. -1 (PI_ERRO...
2024-04-04T03:22:44.5546682Z custom_selector(acc) : No device of requested type available. -1 (PI_ERRO...
Additional context
No response
@JackAKirk FYI
Thank you for this report. We're aware of the issue and are tracking internally.
For now, we've XFAILed the test so that it doesn't affect CI while we continue to investigate this issue. See here: https://github.com/intel/llvm/pull/13425
Please see this PR which addresses the failure in the test.
get_global_id(x) where x>NDims in the nd_item causes the kernel to crash. Simply moving get_global_id(x) inside of the constexprs which guard by the dimension fixes the issue.
Note: the SYCL spec does not define exactly what should happen in the case where get_global_id(x) is called with x>NDims, so I have opened an issue with the Khronos SYCL docs.