llvm icon indicating copy to clipboard operation
llvm copied to clipboard

[SYCL] Use PI APIs for cooperative kernels

Open 0x12CC opened this issue 2 years ago • 10 comments

This change updates the SYCL runtime to use piextKernelSuggestMaxCooperativeGroupCount and piextEnqueueCooperativeKernelLaunch for cooperative kernels. These functions are used to implement the query and launch kernels as described in the sycl_ext_oneapi_root_group extension.

0x12CC avatar Jan 11 '24 18:01 0x12CC

:white_check_mark: With the latest revision this PR passed the C/C++ code formatter.

github-actions[bot] avatar Jan 11 '24 18:01 github-actions[bot]

I'm going to make this ready to review in order to give code owners more time to approve these changes before we merge the UR changes since this requires approvals from a number of different teams.

kbenzie avatar Feb 13 '24 10:02 kbenzie

Any reason these cooperative kernels cannot be fused? Could we implement fusion of cooperative kernels by creating a new cooperative kernel and that's it?

victor-eds avatar Feb 13 '24 11:02 victor-eds

Any reason these cooperative kernels cannot be fused? Could we implement fusion of cooperative kernels by creating a new cooperative kernel and that's it?

Note this is not intended to block this PR, it's just for us to get ready for it

victor-eds avatar Feb 13 '24 12:02 victor-eds

Any reason these cooperative kernels cannot be fused? Could we implement fusion of cooperative kernels by creating a new cooperative kernel and that's it?

I think the reason this wouldn't work is related to the max_num_work_group_sync query. I understood that the fused kernel may have a different maximum, but there wouldn't be any way for the user to query it.

Consider an example with two kernels: kernel A has a max_num_work_group_sync of 4, and kernel B has max_num_work_group_sync of 6. What would be the maximum number of work groups for the fused kernel? A user might expect to be able to synchronize 6, but the device may only support 4 (or fewer).

@Pennycook, could you please comment on this? I'm wondering if it's possible to implement fused cooperative kernels by defining some limitation(s) for interactions between the two features.

0x12CC avatar Feb 13 '24 14:02 0x12CC

@Pennycook, could you please comment on this? I'm wondering if it's possible to implement fused cooperative kernels by defining some limitation(s) for interactions between the two features.

I don't know exactly how a fused cooperative kernel works, so I'm not sure.

If kernel fusion is completely transparent to the user, then I agree the user won't have any way to query a valid number of work-groups to launch. If kernel fusion is programmatic and returns another kernel object/name, then a user could execute the queries on the resulting kernel before launching it.

Pennycook avatar Feb 13 '24 15:02 Pennycook

For Kernel Fusion, we'd need some modifications to the error message here to avoid saying we do not support "Kernel" CG type.

Also, having a test wouldn't hurt. Either reuse this test or creating a new one.

This makes sense. I've updated the message and added a test case in f2dfb69d85019b357ce6cbf1e3f851322faaa6b6. Please let me know if you have a suggestion for a clearer message and I'll make the change.

0x12CC avatar Feb 14 '24 19:02 0x12CC

Looks like after pulling in the latest sycl branch changes and updating the UR tag there's a regression, not sure if this is realted or not.

Timed Out Tests (1):
  SYCL :: ESIMD/regression/complex-lib-lin.cpp

kbenzie avatar Feb 19 '24 15:02 kbenzie

Looks like after pulling in the latest sycl branch changes and updating the UR tag there's a regression, not sure if this is realted or not.

Timed Out Tests (1):
  SYCL :: ESIMD/regression/complex-lib-lin.cpp

Okay, seems to have gone away today after pulling in sycl branch again.

@againull what are your thoughts on the readiness of this PR?

kbenzie avatar Feb 20 '24 15:02 kbenzie

Failed Tests (8):
  SYCL :: Assert/assert_in_kernels_win.cpp
  SYCL :: Assert/assert_in_multiple_tus_one_ndebug_win.cpp
  SYCL :: Assert/assert_in_multiple_tus_win.cpp
  SYCL :: Assert/assert_in_one_kernel_win.cpp
  SYCL :: Assert/assert_in_simultaneous_kernels_win.cpp
  SYCL :: Assert/assert_in_simultaneously_multiple_tus.cpp
  SYCL :: Assert/assert_in_simultaneously_multiple_tus_one_ndebug.cpp
  SYCL :: Plugin/sycl-ls-unified-runtime.cpp

These tests are also failing in https://github.com/intel/llvm/pull/12783 and are unrelated to this PR. @intel/llvm-gatekeepers, I think this is ready to merge.

0x12CC avatar Feb 21 '24 22:02 0x12CC

Windows Gen12:

Failed Tests (8):
  SYCL :: Assert/assert_in_kernels_win.cpp
  SYCL :: Assert/assert_in_multiple_tus_one_ndebug_win.cpp
  SYCL :: Assert/assert_in_multiple_tus_win.cpp
  SYCL :: Assert/assert_in_one_kernel_win.cpp
  SYCL :: Assert/assert_in_simultaneous_kernels_win.cpp
  SYCL :: Assert/assert_in_simultaneously_multiple_tus.cpp
  SYCL :: Assert/assert_in_simultaneously_multiple_tus_one_ndebug.cpp
  SYCL :: Plugin/sycl-ls-unified-runtime.cpp

Reported in https://github.com/intel/llvm/issues/12797 and https://github.com/intel/llvm/issues/12798

steffenlarsen avatar Feb 22 '24 15:02 steffenlarsen