dpctl icon indicating copy to clipboard operation
dpctl copied to clipboard

Enable building to target AMD GPUs

Open ndgrigorian opened this issue 1 year ago • 7 comments

This PR modifies cmake scripts throughout dpctl to enable building for AMD. This is done by either setting the DPCTL_TARGET_AMD environment variable to the intended build architecture, or using -DDPCTL_TARGET_AMD.

_dpctl_sycl_target_compile_options and _dpctl_sycl_target_link_options cmake lists are used to prevent branching logic in later scripts.

  • [X] Have you provided a meaningful PR description?
  • [ ] Have you added a test, reproducer or referred to an issue with a reproducer?
  • [ ] Have you tested your changes locally for CPU and GPU devices?
  • [ ] Have you made sure that new changes do not introduce compiler warnings?
  • [ ] Have you checked performance impact of proposed changes?
  • [X] If this PR is a work in progress, are you opening the PR as a draft?

ndgrigorian avatar Jul 13 '24 22:07 ndgrigorian

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. :crossed_fingers:

github-actions[bot] avatar Jul 13 '24 23:07 github-actions[bot]

Coverage Status

coverage: 87.668% (-0.06%) from 87.725% when pulling db487e8a3316db444a5cae3841f71ea3f0de1d2e on feature/enable-amd-builds into 691c225d8efab4d6c3ee4b5fc442de73ee431e54 on master.

coveralls avatar Jul 13 '24 23:07 coveralls

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_104 ran successfully. Passed: 894 Failed: 15 Skipped: 105

github-actions[bot] avatar Jul 14 '24 19:07 github-actions[bot]

Array API standard conformance tests for dpctl=0.18.0dev0=py310ha798474_180 ran successfully. Passed: 893 Failed: 2 Skipped: 119

github-actions[bot] avatar Jul 29 '24 15:07 github-actions[bot]

Array API standard conformance tests for dpctl=0.18.0dev0=py310ha798474_311 ran successfully. Passed: 894 Failed: 1 Skipped: 119

github-actions[bot] avatar Aug 15 '24 18:08 github-actions[bot]

@ndgrigorian Regrettably use of sycl::log1p change necessary to enable compiling for AMD breaks compiling for CUDA.

Perhaps a preprocessor variable can be used to enable building for SPV/NVPTX or SPV/AMDGCN targets, but not for all three except after the bug gets fixed. It may be possible to write implementation of log1p to enable building for all three too.

oleksandr-pavlyk avatar Aug 17 '24 18:08 oleksandr-pavlyk

@ndgrigorian Regrettably use of sycl::log1p change necessary to enable compiling for AMD breaks compiling for CUDA.

Perhaps a preprocessor variable can be used to enable building for SPV/NVPTX or SPV/AMDGCN targets, but not for all three except after the bug gets fixed. It may be possible to write implementation of log1p to enable building for all three too.

Yes, I only added the commit to make it convenient for the build failure to be reproduced.

Writing our own implementation is possible, too. I think that would be preferable, but on the other hand, it's a corner case to build for both CUDA and AMD.

ndgrigorian avatar Aug 17 '24 21:08 ndgrigorian

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_164 ran successfully. Passed: 894 Failed: 1 Skipped: 119

github-actions[bot] avatar Oct 24 '24 20:10 github-actions[bot]

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_202 ran successfully. Passed: 894 Failed: 1 Skipped: 119

github-actions[bot] avatar Nov 11 '24 22:11 github-actions[bot]

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_254 ran successfully. Passed: 895 Failed: 0 Skipped: 119

github-actions[bot] avatar Nov 20 '24 23:11 github-actions[bot]

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_261 ran successfully. Passed: 894 Failed: 1 Skipped: 119

github-actions[bot] avatar Nov 21 '24 03:11 github-actions[bot]

@oleksandr-pavlyk @antonwolfy

I have successfully built dpctl for both CUDA and HIP simultaneously using this branch on a machine with CUDA and ROCm installed (no AMD devices, however—only Intel and NVidia).

Errors came up while building, but I believe that these were caused by going OOM—errors showed up twice in prod.cpp, once in sum.cpp, and once in copy_and_cast_usm_to_usm.cpp. prod, sum, etc. seemed to work on both level-zero and CUDA without a problem, and tests passed. Even in verbose mode, the message wasn't especially helpful.

I have marked this PR as ready for review, the CUDA segfault is resolved.

ndgrigorian avatar Nov 21 '24 04:11 ndgrigorian

@oleksandr-pavlyk @antonwolfy

I have successfully built dpctl for both CUDA and HIP simultaneously using this branch on a machine with CUDA and ROCm installed (no AMD devices, however—only Intel and NVidia).

Errors came up while building, but I believe that these were caused by going OOM—errors showed up twice in prod.cpp, once in sum.cpp, and once in copy_and_cast_usm_to_usm.cpp. prod, sum, etc. seemed to work on both level-zero and CUDA without a problem, and tests passed. Even in verbose mode, the message wasn't especially helpful.

I have marked this PR as ready for review, the CUDA segfault is resolved.

Worth noting that I tried building with both DPCTL_TARGET_HIP=gfx1100 and DPCTL_TARGET_HIP=gfx1030.

ndgrigorian avatar Nov 21 '24 04:11 ndgrigorian

Is this the command to use ?

python scripts/build_locally.py --verbose --cmake-opts="-DDPCTL_TARGET_HIP=gfx1030"

oleksandr-pavlyk avatar Nov 21 '24 16:11 oleksandr-pavlyk

Is this the command to use ?

python scripts/build_locally.py --verbose --cmake-opts="-DDPCTL_TARGET_HIP=gfx1030"

Yes, and DPCTL_TARGET_HIP=gfx1030 python scripts/build_locally.py --verbose should work too

ndgrigorian avatar Nov 21 '24 17:11 ndgrigorian

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_264 ran successfully. Passed: 894 Failed: 1 Skipped: 119

github-actions[bot] avatar Nov 21 '24 23:11 github-actions[bot]

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_264 ran successfully. Passed: 894 Failed: 1 Skipped: 119

github-actions[bot] avatar Nov 22 '24 00:11 github-actions[bot]

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_278 ran successfully. Passed: 894 Failed: 1 Skipped: 119

github-actions[bot] avatar Nov 22 '24 19:11 github-actions[bot]