Enable building to target AMD GPUs
This PR modifies cmake scripts throughout dpctl to enable building for AMD. This is done by either setting the DPCTL_TARGET_AMD environment variable to the intended build architecture, or using -DDPCTL_TARGET_AMD.
_dpctl_sycl_target_compile_options and _dpctl_sycl_target_link_options cmake lists are used to prevent branching logic in later scripts.
- [X] Have you provided a meaningful PR description?
- [ ] Have you added a test, reproducer or referred to an issue with a reproducer?
- [ ] Have you tested your changes locally for CPU and GPU devices?
- [ ] Have you made sure that new changes do not introduce compiler warnings?
- [ ] Have you checked performance impact of proposed changes?
- [X] If this PR is a work in progress, are you opening the PR as a draft?
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. :crossed_fingers:
coverage: 87.668% (-0.06%) from 87.725% when pulling db487e8a3316db444a5cae3841f71ea3f0de1d2e on feature/enable-amd-builds into 691c225d8efab4d6c3ee4b5fc442de73ee431e54 on master.
Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_104 ran successfully. Passed: 894 Failed: 15 Skipped: 105
Array API standard conformance tests for dpctl=0.18.0dev0=py310ha798474_180 ran successfully. Passed: 893 Failed: 2 Skipped: 119
Array API standard conformance tests for dpctl=0.18.0dev0=py310ha798474_311 ran successfully. Passed: 894 Failed: 1 Skipped: 119
@ndgrigorian Regrettably use of sycl::log1p change necessary to enable compiling for AMD breaks compiling for CUDA.
Perhaps a preprocessor variable can be used to enable building for SPV/NVPTX or SPV/AMDGCN targets, but not for all three except after the bug gets fixed. It may be possible to write implementation of log1p to enable building for all three too.
@ndgrigorian Regrettably use of
sycl::log1pchange necessary to enable compiling for AMD breaks compiling for CUDA.Perhaps a preprocessor variable can be used to enable building for SPV/NVPTX or SPV/AMDGCN targets, but not for all three except after the bug gets fixed. It may be possible to write implementation of log1p to enable building for all three too.
Yes, I only added the commit to make it convenient for the build failure to be reproduced.
Writing our own implementation is possible, too. I think that would be preferable, but on the other hand, it's a corner case to build for both CUDA and AMD.
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_164 ran successfully. Passed: 894 Failed: 1 Skipped: 119
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_202 ran successfully. Passed: 894 Failed: 1 Skipped: 119
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_254 ran successfully. Passed: 895 Failed: 0 Skipped: 119
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_261 ran successfully. Passed: 894 Failed: 1 Skipped: 119
@oleksandr-pavlyk @antonwolfy
I have successfully built dpctl for both CUDA and HIP simultaneously using this branch on a machine with CUDA and ROCm installed (no AMD devices, however—only Intel and NVidia).
Errors came up while building, but I believe that these were caused by going OOM—errors showed up twice in prod.cpp, once in sum.cpp, and once in copy_and_cast_usm_to_usm.cpp. prod, sum, etc. seemed to work on both level-zero and CUDA without a problem, and tests passed. Even in verbose mode, the message wasn't especially helpful.
I have marked this PR as ready for review, the CUDA segfault is resolved.
@oleksandr-pavlyk @antonwolfy
I have successfully built dpctl for both CUDA and HIP simultaneously using this branch on a machine with CUDA and ROCm installed (no AMD devices, however—only Intel and NVidia).
Errors came up while building, but I believe that these were caused by going OOM—errors showed up twice in
prod.cpp, once insum.cpp, and once incopy_and_cast_usm_to_usm.cpp.prod,sum, etc. seemed to work on both level-zero and CUDA without a problem, and tests passed. Even in verbose mode, the message wasn't especially helpful.I have marked this PR as ready for review, the CUDA segfault is resolved.
Worth noting that I tried building with both DPCTL_TARGET_HIP=gfx1100 and DPCTL_TARGET_HIP=gfx1030.
Is this the command to use ?
python scripts/build_locally.py --verbose --cmake-opts="-DDPCTL_TARGET_HIP=gfx1030"
Is this the command to use ?
python scripts/build_locally.py --verbose --cmake-opts="-DDPCTL_TARGET_HIP=gfx1030"
Yes, and DPCTL_TARGET_HIP=gfx1030 python scripts/build_locally.py --verbose should work too
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_264 ran successfully. Passed: 894 Failed: 1 Skipped: 119
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_264 ran successfully. Passed: 894 Failed: 1 Skipped: 119
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_278 ran successfully. Passed: 894 Failed: 1 Skipped: 119