dpctl Enable building to target AMD GPUs

This PR modifies cmake scripts throughout dpctl to enable building for AMD. This is done by either setting the DPCTL_TARGET_AMD environment variable to the intended build architecture, or using -DDPCTL_TARGET_AMD.

_dpctl_sycl_target_compile_options and _dpctl_sycl_target_link_options cmake lists are used to prevent branching logic in later scripts.

[X] Have you provided a meaningful PR description?
[ ] Have you added a test, reproducer or referred to an issue with a reproducer?
[ ] Have you tested your changes locally for CPU and GPU devices?
[ ] Have you made sure that new changes do not introduce compiler warnings?
[ ] Have you checked performance impact of proposed changes?
[X] If this PR is a work in progress, are you opening the PR as a draft?

Jul 13 '24 22:07 ndgrigorian

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. :crossed_fingers:

Jul 13 '24 23:07 github-actions[bot]

coverage: 87.668% (-0.06%) from 87.725% when pulling db487e8a3316db444a5cae3841f71ea3f0de1d2e on feature/enable-amd-builds into 691c225d8efab4d6c3ee4b5fc442de73ee431e54 on master.

Jul 13 '24 23:07 coveralls

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_104 ran successfully. Passed: 894 Failed: 15 Skipped: 105

Jul 14 '24 19:07 github-actions[bot]

Array API standard conformance tests for dpctl=0.18.0dev0=py310ha798474_180 ran successfully. Passed: 893 Failed: 2 Skipped: 119

Jul 29 '24 15:07 github-actions[bot]

Array API standard conformance tests for dpctl=0.18.0dev0=py310ha798474_311 ran successfully. Passed: 894 Failed: 1 Skipped: 119

Aug 15 '24 18:08 github-actions[bot]

@ndgrigorian Regrettably use of sycl::log1p change necessary to enable compiling for AMD breaks compiling for CUDA.

Perhaps a preprocessor variable can be used to enable building for SPV/NVPTX or SPV/AMDGCN targets, but not for all three except after the bug gets fixed. It may be possible to write implementation of log1p to enable building for all three too.

Aug 17 '24 18:08 oleksandr-pavlyk

@ndgrigorian Regrettably use of sycl::log1p change necessary to enable compiling for AMD breaks compiling for CUDA.

Perhaps a preprocessor variable can be used to enable building for SPV/NVPTX or SPV/AMDGCN targets, but not for all three except after the bug gets fixed. It may be possible to write implementation of log1p to enable building for all three too.

Yes, I only added the commit to make it convenient for the build failure to be reproduced.

Writing our own implementation is possible, too. I think that would be preferable, but on the other hand, it's a corner case to build for both CUDA and AMD.

Aug 17 '24 21:08 ndgrigorian

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_164 ran successfully. Passed: 894 Failed: 1 Skipped: 119

Oct 24 '24 20:10 github-actions[bot]

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_202 ran successfully. Passed: 894 Failed: 1 Skipped: 119

Nov 11 '24 22:11 github-actions[bot]

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_254 ran successfully. Passed: 895 Failed: 0 Skipped: 119

Nov 20 '24 23:11 github-actions[bot]

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_261 ran successfully. Passed: 894 Failed: 1 Skipped: 119

Nov 21 '24 03:11 github-actions[bot]

@oleksandr-pavlyk @antonwolfy

I have successfully built dpctl for both CUDA and HIP simultaneously using this branch on a machine with CUDA and ROCm installed (no AMD devices, however—only Intel and NVidia).

Errors came up while building, but I believe that these were caused by going OOM—errors showed up twice in prod.cpp, once in sum.cpp, and once in copy_and_cast_usm_to_usm.cpp. prod, sum, etc. seemed to work on both level-zero and CUDA without a problem, and tests passed. Even in verbose mode, the message wasn't especially helpful.

I have marked this PR as ready for review, the CUDA segfault is resolved.

Nov 21 '24 04:11 ndgrigorian

@oleksandr-pavlyk @antonwolfy

I have successfully built dpctl for both CUDA and HIP simultaneously using this branch on a machine with CUDA and ROCm installed (no AMD devices, however—only Intel and NVidia).

Errors came up while building, but I believe that these were caused by going OOM—errors showed up twice in prod.cpp, once in sum.cpp, and once in copy_and_cast_usm_to_usm.cpp. prod, sum, etc. seemed to work on both level-zero and CUDA without a problem, and tests passed. Even in verbose mode, the message wasn't especially helpful.

I have marked this PR as ready for review, the CUDA segfault is resolved.

Worth noting that I tried building with both DPCTL_TARGET_HIP=gfx1100 and DPCTL_TARGET_HIP=gfx1030.

Nov 21 '24 04:11 ndgrigorian

Is this the command to use ?

python scripts/build_locally.py --verbose --cmake-opts="-DDPCTL_TARGET_HIP=gfx1030"

Nov 21 '24 16:11 oleksandr-pavlyk

Is this the command to use ?

python scripts/build_locally.py --verbose --cmake-opts="-DDPCTL_TARGET_HIP=gfx1030"

Yes, and DPCTL_TARGET_HIP=gfx1030 python scripts/build_locally.py --verbose should work too

Nov 21 '24 17:11 ndgrigorian

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_264 ran successfully. Passed: 894 Failed: 1 Skipped: 119

Nov 21 '24 23:11 github-actions[bot]

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_264 ran successfully. Passed: 894 Failed: 1 Skipped: 119

Nov 22 '24 00:11 github-actions[bot]

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_278 ran successfully. Passed: 894 Failed: 1 Skipped: 119

Nov 22 '24 19:11 github-actions[bot]