[SYCL] Use PI APIs for cooperative kernels
This change updates the SYCL runtime to use piextKernelSuggestMaxCooperativeGroupCount and piextEnqueueCooperativeKernelLaunch for cooperative kernels. These functions are used to implement the query and launch kernels as described in the sycl_ext_oneapi_root_group extension.
:white_check_mark: With the latest revision this PR passed the C/C++ code formatter.
I'm going to make this ready to review in order to give code owners more time to approve these changes before we merge the UR changes since this requires approvals from a number of different teams.
Any reason these cooperative kernels cannot be fused? Could we implement fusion of cooperative kernels by creating a new cooperative kernel and that's it?
Any reason these cooperative kernels cannot be fused? Could we implement fusion of cooperative kernels by creating a new cooperative kernel and that's it?
Note this is not intended to block this PR, it's just for us to get ready for it
Any reason these cooperative kernels cannot be fused? Could we implement fusion of cooperative kernels by creating a new cooperative kernel and that's it?
I think the reason this wouldn't work is related to the max_num_work_group_sync query. I understood that the fused kernel may have a different maximum, but there wouldn't be any way for the user to query it.
Consider an example with two kernels: kernel A has a max_num_work_group_sync of 4, and kernel B has max_num_work_group_sync of 6. What would be the maximum number of work groups for the fused kernel? A user might expect to be able to synchronize 6, but the device may only support 4 (or fewer).
@Pennycook, could you please comment on this? I'm wondering if it's possible to implement fused cooperative kernels by defining some limitation(s) for interactions between the two features.
@Pennycook, could you please comment on this? I'm wondering if it's possible to implement fused cooperative kernels by defining some limitation(s) for interactions between the two features.
I don't know exactly how a fused cooperative kernel works, so I'm not sure.
If kernel fusion is completely transparent to the user, then I agree the user won't have any way to query a valid number of work-groups to launch. If kernel fusion is programmatic and returns another kernel object/name, then a user could execute the queries on the resulting kernel before launching it.
For Kernel Fusion, we'd need some modifications to the error message here to avoid saying we do not support "Kernel" CG type.
Also, having a test wouldn't hurt. Either reuse this test or creating a new one.
This makes sense. I've updated the message and added a test case in f2dfb69d85019b357ce6cbf1e3f851322faaa6b6. Please let me know if you have a suggestion for a clearer message and I'll make the change.
Looks like after pulling in the latest sycl branch changes and updating the UR tag there's a regression, not sure if this is realted or not.
Timed Out Tests (1):
SYCL :: ESIMD/regression/complex-lib-lin.cpp
Looks like after pulling in the latest sycl branch changes and updating the UR tag there's a regression, not sure if this is realted or not.
Timed Out Tests (1): SYCL :: ESIMD/regression/complex-lib-lin.cpp
Okay, seems to have gone away today after pulling in sycl branch again.
@againull what are your thoughts on the readiness of this PR?
Failed Tests (8):
SYCL :: Assert/assert_in_kernels_win.cpp
SYCL :: Assert/assert_in_multiple_tus_one_ndebug_win.cpp
SYCL :: Assert/assert_in_multiple_tus_win.cpp
SYCL :: Assert/assert_in_one_kernel_win.cpp
SYCL :: Assert/assert_in_simultaneous_kernels_win.cpp
SYCL :: Assert/assert_in_simultaneously_multiple_tus.cpp
SYCL :: Assert/assert_in_simultaneously_multiple_tus_one_ndebug.cpp
SYCL :: Plugin/sycl-ls-unified-runtime.cpp
These tests are also failing in https://github.com/intel/llvm/pull/12783 and are unrelated to this PR. @intel/llvm-gatekeepers, I think this is ready to merge.
Windows Gen12:
Failed Tests (8):
SYCL :: Assert/assert_in_kernels_win.cpp
SYCL :: Assert/assert_in_multiple_tus_one_ndebug_win.cpp
SYCL :: Assert/assert_in_multiple_tus_win.cpp
SYCL :: Assert/assert_in_one_kernel_win.cpp
SYCL :: Assert/assert_in_simultaneous_kernels_win.cpp
SYCL :: Assert/assert_in_simultaneously_multiple_tus.cpp
SYCL :: Assert/assert_in_simultaneously_multiple_tus_one_ndebug.cpp
SYCL :: Plugin/sycl-ls-unified-runtime.cpp
Reported in https://github.com/intel/llvm/issues/12797 and https://github.com/intel/llvm/issues/12798