Implement urKernelGetSuggestedLocalWorkSize
This PR try to implement the API urKernelGetSuggestedLocalWorkSize, discussed in https://github.com/oneapi-src/unified-runtime/issues/1270.
SYCLOS PR: https://github.com/intel/llvm/pull/12902
Also fix:
- For Level-Zero: when
LocalWorkSizeis provided,urEnqueueKernelLaunch()will readLocalWorkSizewithout respectingworkDim.
Do we need to add some conformance tests?
Do we need to add some conformance tests?
Yang has asked me about this, but I can't think of anything meaningful. Do you have some ideas for adding tests for this?
Codecov Report
Attention: Patch coverage is 5.51724% with 137 lines in your changes are missing coverage. Please review.
Project coverage is 12.40%. Comparing base (
78ef1ca) to head (9850c12). Report is 188 commits behind head on main.
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@ Coverage Diff @@
## main #1385 +/- ##
==========================================
- Coverage 14.82% 12.40% -2.42%
==========================================
Files 250 242 -8
Lines 36220 36387 +167
Branches 4094 4126 +32
==========================================
- Hits 5369 4514 -855
- Misses 30800 31869 +1069
+ Partials 51 4 -47
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Do we need to add some conformance tests?
Yes please.
Do you have some ideas for adding tests for this?
We do negative testing in the UR CTS, this includes checking that the error codes returned from this entry point are correctly returned when the conditions defined in the specification are met. We should also test the success pass of the entry-point, we might also want to check that the returned local work size doesn't contain obviously invalid data, if possible. At the very least, if the adapter supports this entry point, we should check that it doesn't return an error when provided with valid arguments.
Remove WIP from PR title since this PR is ready for review?
I will continue to work on adding some tests for the API. In the meanwhile, please review the API implementation.
I checked out the L0 failures in the CI and tried to reproduce them locally. I did manage to reproduce the failure in local env, but then I checked out main branch and tested again, theose failures are still there. So I think these are not related to this PR.
I checked out the L0 failures in the CI and tried to reproduce them locally. I did manage to reproduce the failure in local env, but then I checked out main branch and tested again, theose failures are still there. So I think these are not related to this PR.
You've hit a broken runner. It's off now so I re-ran the failed job.
Hi all, I have add some tests to the CTS for the new API, please help review the PR, thank you!
Hi @kbenzie , are those L0 failures known? I try to reproduce the problem in my local but I get some more errors even with the main branch.
40% tests passed, 9 tests failed out of 15
Label Time Summary:
adapter_level_zero = 10.85 sec*proc (15 tests)
conformance = 10.85 sec*proc (15 tests)
Total Test time (real) = 10.87 sec
The following tests FAILED:
17 - platform-adapter_level_zero (Failed)
18 - device-adapter_level_zero (Failed)
19 - context-adapter_level_zero (Failed)
21 - usm-adapter_level_zero (Failed)
24 - sampler-adapter_level_zero (Failed)
25 - virtual_memory-adapter_level_zero (Failed)
26 - kernel-adapter_level_zero (Failed)
29 - exp_command_buffer-adapter_level_zero (Failed)
30 - exp_usm_p2p-adapter_level_zero (Failed)
yeah the l0 fails are a known issue, we're working on implementing an acceptable workaround
yeah the l0 fails are a known issue, we're working on implementing an acceptable workaround
But they shouldn't be showing up in CI, since we disabled the systems where this problem occurs. @lukaszstolarczuk ?
I allowed myself to re-run the CI. I'm not sure if this is related to our known issue with L0.
Version of 'intel-level-zero-gpu' on our last good runner (pc_intel_31) is 1.3.26241.33-647~22.04, previously we got green results on 1.3.26241.22 (so it's very close) and we got errors on (now disabled) runners with version 1.3.26918.50-736~22.04.
Also, we have passing CI on main on the pc_intel_31 runner (e.g., 3hrs ago: https://github.com/oneapi-src/unified-runtime/actions/runs/8451008567/job/23148409552)
Okay, I will keep looking into the problems then. But it is hard for me to identify the problem with the main branch also reporting many errors. So the 1.3.26241.22 and 1.3.26241.33-647~22.04 are the known good versions of driver, right? Then I will try to use that version and see what happens. Thank you all for the information.
Yeah! I changed the driver to 1.3.26241.14 and the main branch passed the test without any problems. And I can reproduce the failure now. Thank you @lukaszstolarczuk.
Turns out these three failures
urKernelSetArgPointerTest.SuccessHost/Intel_R__oneAPI_Unified_Runtime_over_Level_Zero___{{.*}}_
urKernelSetArgPointerTest.SuccessDevice/Intel_R__oneAPI_Unified_Runtime_over_Level_Zero___{{.*}}_
urKernelSetArgPointerTest.SuccessShared/Intel_R__oneAPI_Unified_Runtime_over_Level_Zero___{{.*}}_
are fixed by the PR, which is exactly the problem in the "Also Fix", which is "For Level-Zero: when LocalWorkSize is provided, urEnqueueKernelLaunch() will read LocalWorkSize without respecting workDim.", so they may have unexpected value in LocalWorkSIze[1] or LocalWorkSize[2] that may trigger the assertion.
UR_ASSERT(LocalWorkSize[0] < (std::numeric_limits<uint32_t>::max)(),
UR_RESULT_ERROR_INVALID_VALUE);
UR_ASSERT(LocalWorkSize[1] < (std::numeric_limits<uint32_t>::max)(),
UR_RESULT_ERROR_INVALID_VALUE);
UR_ASSERT(LocalWorkSize[2] < (std::numeric_limits<uint32_t>::max)(),
UR_RESULT_ERROR_INVALID_VALUE);
Hi all, please help review the PR, thank you!
Hi @kbenzie , @hdelan , @ldrumm , @oneapi-src/unified-runtime-level-zero-write , @oneapi-src/unified-runtime-native-cpu-write , please help review the PR, thank you very much!
Kindly ping @kbenzie , @hdelan , @ldrumm , @oneapi-src/unified-runtime-level-zero-write , @oneapi-src/unified-runtime-native-cpu-write , please help review the PR, thank you very much!
Kindly ping @kbenzie , @hdelan , @ldrumm , @oneapi-src/unified-runtime-level-zero-write , please help review the PR, thank you very much!
Kindly ping @kbenzie , @hdelan , @ldrumm , @pbalcer , @oneapi-src/unified-runtime-level-zero-write , please help review the PR, thank you very much!
Kindly ping @oneapi-src/unified-runtime-level-zero-write , please help review the PR, thank you very much!
Kindly ping @oneapi-src/unified-runtime-level-zero-write , please help review the PR, thank you very much!
Kindly ping @oneapi-src/unified-runtime-level-zero-write , please help review the PR, thank you very much!
Hi @nrspruit, @raiyanla, @winstonzhang-intel, could you please review this PR. Thanks!
Hi @nrspruit, @raiyanla, @winstonzhang-intel, could you please review this PR. Thanks!
Hi @kbenzie, I got all the approval needed. How do you think I should proceed?
Hi @kbenzie, I got all the approval needed. How do you think I should proceed?
I've added the ready to merge label, its now in the merge queue so we'll merge this soon.
Thank you!
Its unclear why the CodeQL Windows job is failing, its not an issue with this PR so I'll merge anyway.