unified-runtime Implement urKernelGetSuggestedLocalWorkSize

This PR try to implement the API urKernelGetSuggestedLocalWorkSize, discussed in https://github.com/oneapi-src/unified-runtime/issues/1270. SYCLOS PR: https://github.com/intel/llvm/pull/12902

Also fix:

For Level-Zero: when LocalWorkSize is provided, urEnqueueKernelLaunch() will read LocalWorkSize without respecting workDim.

Feb 27 '24 05:02 yingcong-wu

Do we need to add some conformance tests?

Mar 11 '24 08:03 zhaomaosu

Do we need to add some conformance tests?

Yang has asked me about this, but I can't think of anything meaningful. Do you have some ideas for adding tests for this?

Mar 11 '24 09:03 yingcong-wu

Codecov Report

Attention: Patch coverage is 5.51724% with 137 lines in your changes are missing coverage. Please review.

Project coverage is 12.40%. Comparing base (78ef1ca) to head (9850c12). Report is 188 commits behind head on main.

Files	Patch %	Lines
...mance/kernel/urKernelGetSuggestedLocalWorkSize.cpp	0.00%	46 Missing :warning:
include/ur_print.hpp	0.00%	29 Missing :warning:
source/loader/layers/validation/ur_valddi.cpp	11.53%	23 Missing :warning:
source/loader/layers/tracing/ur_trcddi.cpp	23.07%	10 Missing :warning:
source/loader/ur_ldrddi.cpp	9.09%	10 Missing :warning:
source/loader/ur_libapi.cpp	0.00%	8 Missing :warning:
source/adapters/null/ur_nullddi.cpp	12.50%	7 Missing :warning:
source/loader/ur_print.cpp	0.00%	4 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1385      +/-   ##
==========================================
- Coverage   14.82%   12.40%   -2.42%     
==========================================
  Files         250      242       -8     
  Lines       36220    36387     +167     
  Branches     4094     4126      +32     
==========================================
- Hits         5369     4514     -855     
- Misses      30800    31869    +1069     
+ Partials       51        4      -47

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Mar 11 '24 09:03 codecov-commenter

Do we need to add some conformance tests?

Yes please.

Mar 11 '24 12:03 kbenzie

Do you have some ideas for adding tests for this?

We do negative testing in the UR CTS, this includes checking that the error codes returned from this entry point are correctly returned when the conditions defined in the specification are met. We should also test the success pass of the entry-point, we might also want to check that the returned local work size doesn't contain obviously invalid data, if possible. At the very least, if the adapter supports this entry point, we should check that it doesn't return an error when provided with valid arguments.

Mar 11 '24 12:03 kbenzie

Remove WIP from PR title since this PR is ready for review?

Mar 12 '24 04:03 wenju-he

I will continue to work on adding some tests for the API. In the meanwhile, please review the API implementation.

Mar 12 '24 04:03 yingcong-wu

I checked out the L0 failures in the CI and tried to reproduce them locally. I did manage to reproduce the failure in local env, but then I checked out main branch and tested again, theose failures are still there. So I think these are not related to this PR.

Mar 12 '24 07:03 yingcong-wu

I checked out the L0 failures in the CI and tried to reproduce them locally. I did manage to reproduce the failure in local env, but then I checked out main branch and tested again, theose failures are still there. So I think these are not related to this PR.

You've hit a broken runner. It's off now so I re-ran the failed job.

Mar 12 '24 09:03 pbalcer

Hi all, I have add some tests to the CTS for the new API, please help review the PR, thank you!

Mar 19 '24 03:03 yingcong-wu

Hi @kbenzie , are those L0 failures known? I try to reproduce the problem in my local but I get some more errors even with the main branch.

40% tests passed, 9 tests failed out of 15

Label Time Summary:
adapter_level_zero    =  10.85 sec*proc (15 tests)
conformance           =  10.85 sec*proc (15 tests)

Total Test time (real) =  10.87 sec

The following tests FAILED:
         17 - platform-adapter_level_zero (Failed)
         18 - device-adapter_level_zero (Failed)
         19 - context-adapter_level_zero (Failed)
         21 - usm-adapter_level_zero (Failed)
         24 - sampler-adapter_level_zero (Failed)
         25 - virtual_memory-adapter_level_zero (Failed)
         26 - kernel-adapter_level_zero (Failed)
         29 - exp_command_buffer-adapter_level_zero (Failed)
         30 - exp_usm_p2p-adapter_level_zero (Failed)

Mar 27 '24 03:03 yingcong-wu

yeah the l0 fails are a known issue, we're working on implementing an acceptable workaround

Mar 27 '24 14:03 aarongreig

yeah the l0 fails are a known issue, we're working on implementing an acceptable workaround

But they shouldn't be showing up in CI, since we disabled the systems where this problem occurs. @lukaszstolarczuk ?

Mar 27 '24 14:03 pbalcer

I allowed myself to re-run the CI. I'm not sure if this is related to our known issue with L0.

Version of 'intel-level-zero-gpu' on our last good runner (pc_intel_31) is 1.3.26241.33-647~22.04, previously we got green results on 1.3.26241.22 (so it's very close) and we got errors on (now disabled) runners with version 1.3.26918.50-736~22.04.

Also, we have passing CI on main on the pc_intel_31 runner (e.g., 3hrs ago: https://github.com/oneapi-src/unified-runtime/actions/runs/8451008567/job/23148409552)

Mar 27 '24 14:03 lukaszstolarczuk

Okay, I will keep looking into the problems then. But it is hard for me to identify the problem with the main branch also reporting many errors. So the 1.3.26241.22 and 1.3.26241.33-647~22.04 are the known good versions of driver, right? Then I will try to use that version and see what happens. Thank you all for the information.

Mar 28 '24 01:03 yingcong-wu

Yeah! I changed the driver to 1.3.26241.14 and the main branch passed the test without any problems. And I can reproduce the failure now. Thank you @lukaszstolarczuk.

Mar 28 '24 02:03 yingcong-wu

Turns out these three failures

urKernelSetArgPointerTest.SuccessHost/Intel_R__oneAPI_Unified_Runtime_over_Level_Zero___{{.*}}_
urKernelSetArgPointerTest.SuccessDevice/Intel_R__oneAPI_Unified_Runtime_over_Level_Zero___{{.*}}_
urKernelSetArgPointerTest.SuccessShared/Intel_R__oneAPI_Unified_Runtime_over_Level_Zero___{{.*}}_

are fixed by the PR, which is exactly the problem in the "Also Fix", which is "For Level-Zero: when LocalWorkSize is provided, urEnqueueKernelLaunch() will read LocalWorkSize without respecting workDim.", so they may have unexpected value in LocalWorkSIze[1] or LocalWorkSize[2] that may trigger the assertion.

    UR_ASSERT(LocalWorkSize[0] < (std::numeric_limits<uint32_t>::max)(),
              UR_RESULT_ERROR_INVALID_VALUE);
    UR_ASSERT(LocalWorkSize[1] < (std::numeric_limits<uint32_t>::max)(),
              UR_RESULT_ERROR_INVALID_VALUE);
    UR_ASSERT(LocalWorkSize[2] < (std::numeric_limits<uint32_t>::max)(),
              UR_RESULT_ERROR_INVALID_VALUE);

Mar 28 '24 06:03 yingcong-wu

Hi all, please help review the PR, thank you!

Mar 29 '24 01:03 yingcong-wu

Hi @kbenzie , @hdelan , @ldrumm , @oneapi-src/unified-runtime-level-zero-write , @oneapi-src/unified-runtime-native-cpu-write , please help review the PR, thank you very much!

Apr 11 '24 01:04 yingcong-wu

Kindly ping @kbenzie , @hdelan , @ldrumm , @oneapi-src/unified-runtime-level-zero-write , @oneapi-src/unified-runtime-native-cpu-write , please help review the PR, thank you very much!

Apr 17 '24 03:04 yingcong-wu

Kindly ping @kbenzie , @hdelan , @ldrumm , @oneapi-src/unified-runtime-level-zero-write , please help review the PR, thank you very much!

Apr 23 '24 05:04 yingcong-wu

Kindly ping @kbenzie , @hdelan , @ldrumm , @pbalcer , @oneapi-src/unified-runtime-level-zero-write , please help review the PR, thank you very much!

Apr 30 '24 02:04 yingcong-wu

Kindly ping @oneapi-src/unified-runtime-level-zero-write , please help review the PR, thank you very much!

May 06 '24 03:05 yingcong-wu

Kindly ping @oneapi-src/unified-runtime-level-zero-write , please help review the PR, thank you very much!

May 13 '24 08:05 yingcong-wu

Kindly ping @oneapi-src/unified-runtime-level-zero-write , please help review the PR, thank you very much!

Hi @nrspruit, @raiyanla, @winstonzhang-intel, could you please review this PR. Thanks!

May 14 '24 01:05 yingcong-wu

Hi @nrspruit, @raiyanla, @winstonzhang-intel, could you please review this PR. Thanks!

May 20 '24 08:05 yingcong-wu

Hi @kbenzie, I got all the approval needed. How do you think I should proceed?

May 21 '24 01:05 yingcong-wu

Hi @kbenzie, I got all the approval needed. How do you think I should proceed?

I've added the ready to merge label, its now in the merge queue so we'll merge this soon.

May 21 '24 10:05 kbenzie

Thank you!

May 22 '24 01:05 yingcong-wu

Its unclear why the CodeQL Windows job is failing, its not an issue with this PR so I'll merge anyway.

Jun 04 '24 09:06 kbenzie