llvm [CUDA][E2E] bindless_images/read_norm_types.cpp failed on SYCL Nightly

Describe the bug

bindless_images/read_norm_types.cpp Refer: https://github.com/intel/llvm/actions/runs/8548669532/job/23442092835

2024-04-04T03:48:41.2565096Z FAIL: SYCL :: bindless_images/read_norm_types.cpp (1910 of 1967)
2024-04-04T03:48:41.2566644Z ******************** TEST 'SYCL :: bindless_images/read_norm_types.cpp' FAILED ********************
2024-04-04T03:48:41.2567397Z Exit Code: 1
2024-04-04T03:48:41.2567596Z 
2024-04-04T03:48:41.2567761Z Command Output (stdout):
2024-04-04T03:48:41.2568176Z --
2024-04-04T03:48:41.2568489Z # RUN: at line 4
2024-04-04T03:48:41.2571279Z /__w/llvm/llvm/toolchain/bin//clang++   -fsycl -fsycl-targets=nvptx64-nvidia-cuda /__w/llvm/llvm/llvm/sycl/test-e2e/bindless_images/read_norm_types.cpp -o /__w/llvm/llvm/build-e2e/bindless_images/Output/read_norm_types.cpp.tmp.out
2024-04-04T03:48:41.2575803Z # executed command: /__w/llvm/llvm/toolchain/bin//clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda /__w/llvm/llvm/llvm/sycl/test-e2e/bindless_images/read_norm_types.cpp -o /__w/llvm/llvm/build-e2e/bindless_images/Output/read_norm_types.cpp.tmp.out
2024-04-04T03:48:41.2596880Z # note: command had no output on stdout or stderr
2024-04-04T03:48:41.2597862Z # RUN: at line 5
2024-04-04T03:48:41.2609779Z env SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT=1 ONEAPI_DEVICE_SELECTOR=cuda:gpu  /__w/llvm/llvm/build-e2e/bindless_images/Output/read_norm_types.cpp.tmp.out
2024-04-04T03:48:41.2613354Z # executed command: env SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT=1 ONEAPI_DEVICE_SELECTOR=cuda:gpu /__w/llvm/llvm/build-e2e/bindless_images/Output/read_norm_types.cpp.tmp.out
2024-04-04T03:48:41.2615612Z # .---command stderr------------
2024-04-04T03:48:41.2616385Z # | Some test cases failed
2024-04-04T03:48:41.2617158Z # `-----------------------------
2024-04-04T03:48:41.2617953Z # error: command failed with exit status: 1
2024-04-04T03:48:41.2618584Z 
2024-04-04T03:48:41.2618842Z --

To reproduce

DPC++ version: fe44bad6c658165a82953f86ce1f7f70529f2c85

Environment

sycl-ls --verbose output:

2024-04-04T03:22:44.5499901Z ur_print: Images are not fully supported by the CUDA BE, their support is disabled by default. Their partial support can be activated by setting SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT environment variable at runtime.
2024-04-04T03:22:44.5501510Z [cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3090 8.6 [CUDA 12.1]
2024-04-04T03:22:44.5502289Z 
2024-04-04T03:22:44.5502372Z Platforms: 1
2024-04-04T03:22:44.5502585Z Platform [#1]:
2024-04-04T03:22:44.5503010Z     Version  : CUDA 12.1
2024-04-04T03:22:44.5503546Z     Name     : NVIDIA CUDA BACKEND
2024-04-04T03:22:44.5504233Z     Vendor   : NVIDIA Corporation
2024-04-04T03:22:44.5504845Z     Devices  : 1
2024-04-04T03:22:44.5505458Z         Device [#0]:
2024-04-04T03:22:44.5506177Z         Type       : gpu
2024-04-04T03:22:44.5506842Z         Version    : 8.6
2024-04-04T03:22:44.5507577Z         Name       : NVIDIA GeForce RTX 3090
2024-04-04T03:22:44.5508469Z         Vendor     : NVIDIA Corporation
2024-04-04T03:22:44.5509292Z         Driver     : CUDA 12.1
2024-04-04T03:22:44.5517816Z         Aspects    : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_native_assert ext_oneapi_bfloat16_math_functions ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_width ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_interop_memory_import ext_oneapi_interop_semaphore_import ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_oneapi_mipmap_level_reference ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_graph ext_oneapi_limited_graph
2024-04-04T03:22:44.5536144Z         info::device::sub_group_sizes: 32
2024-04-04T03:22:44.5537445Z default_selector()      : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3090 8.6 [CUDA 12.1]
2024-04-04T03:22:44.5539394Z accelerator_selector()  : No device of requested type available. -1 (PI_ERRO...
2024-04-04T03:22:44.5541078Z cpu_selector()          : No device of requested type available. -1 (PI_ERRO...
2024-04-04T03:22:44.5542629Z gpu_selector()          : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3090 8.6 [CUDA 12.1]
2024-04-04T03:22:44.5544340Z custom_selector(gpu)    : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3090 8.6 [CUDA 12.1]
2024-04-04T03:22:44.5545985Z custom_selector(cpu)    : No device of requested type available. -1 (PI_ERRO...
2024-04-04T03:22:44.5546682Z custom_selector(acc)    : No device of requested type available. -1 (PI_ERRO...

Additional context

No response

Apr 04 '24 13:04 uditagarwal97

@JackAKirk FYI

Apr 04 '24 13:04 uditagarwal97

Thank you for this report. We're aware of the issue and are tracking internally.

Apr 04 '24 13:04 Seanst98

For now, we've XFAILed the test so that it doesn't affect CI while we continue to investigate this issue. See here: https://github.com/intel/llvm/pull/13425

Apr 17 '24 14:04 Seanst98

Please see this PR which addresses the failure in the test.

get_global_id(x) where x>NDims in the nd_item causes the kernel to crash. Simply moving get_global_id(x) inside of the constexprs which guard by the dimension fixes the issue.

Note: the SYCL spec does not define exactly what should happen in the case where get_global_id(x) is called with x>NDims, so I have opened an issue with the Khronos SYCL docs.

May 09 '24 09:05 Seanst98