LightGBM icon indicating copy to clipboard operation
LightGBM copied to clipboard

[ROCm] add support for ROCm/HIP device

Open jeffdaily opened this issue 2 years ago • 20 comments

To build for ROCm:

./helpers/hipify.sh
mkdir build
cd build
cmake -DUSE_ROCM=1 ..

CUDA source files are hipified in-place using the helper script before running cmake. The "cuda" device is re-used for rocm, so device=cuda will work the same for rocm builds.

Summary of changes:

  • CMakeLists.txt ROCm updates, also replace glob with explicit file list
  • support both warpSize 32 and 64
  • helpers/hipify.sh script added
  • .gitignore to ignore generated hip source files *.prehip
  • disable compiler warnings
  • move __device__ template function PercentileDevice into header
  • bug fixes for __host__ __define__

jeffdaily avatar Sep 08 '23 16:09 jeffdaily

@jeffdaily Thank you, this is very exciting! @jameslamb ROCm is the counterpart of CUDA for AMD GPU. I don't have any prior discussion with @jeffdaily about this. But it is very exciting if we can enlarge the devices supported by LightGBM.

shiyu1994 avatar Sep 08 '23 16:09 shiyu1994

Apologies for coming out of nowhere with this. We use LightGBM; the OpenCL-based 'gpu' device already works on our AMD GPUs. But we were curious if we could get better performance if we ported the 'cuda' device to AMD GPUs. This started as a proof of concept, but it seemed useful to share even in its current state.

Using the GPU-Tutorial, here are my results on our MI210.

what is evaluated CPU GPU/OpenCL "cuda" but really ROCm
correctness auc : 0.821268
18.547533 seconds
auc : 0.821268
20.386780 seconds
auc : 0.821268
9.049307 seconds
speed objective=binary metric=auc 22.604444 seconds 18.028674 seconds 7.787303 seconds
speed objective=regression_l2 metric=l2 18.961535 seconds 14.491217 seconds 7.871302 seconds

jeffdaily avatar Sep 08 '23 17:09 jeffdaily

  • what is ROCm/HIP? Where can we read to learn more?

https://rocm.docs.amd.com/en/latest/rocm.html

  • what is the value of this addition to LightGBM's users? What does this offer that the OpenCL-based and CUDA-based builds of LightGBM don't already offer?

See the perf results from the comment above.

  • this project's OpenCL-based GPU build is already struggling from a severe lack of maintenance... I'm very skeptical of taking on a third GPU build
  • how might we test this? What types of devices should we expect to be supported?

Here is the current list of supported AMD GPUs.

To test this, you'll need to run on one of the supported AMD GPUs. How is the cuda device currently tested?

jeffdaily avatar Sep 08 '23 17:09 jeffdaily

Thank you and kudos Jeff! This work has been much needed! Best regards, Ismail

ibustany avatar Sep 08 '23 19:09 ibustany

To test this, you'll need to run on one of the supported AMD GPUs. How is the cuda device currently tested?

We run a VM in Azure with a Tesla V100 on it, and schedule jobs onto it via GitHub Actions.

  • example build link: https://github.com/microsoft/LightGBM/actions/runs/6123938185/job/16622920873#step:5:34-51
  • configuration: https://github.com/microsoft/LightGBM/blob/04b66e066228a947e5d713626e5b14439ada0909/.github/workflows/cuda.yml#L25

Are you aware of any free CI service supporting AMD GPUs? Otherwise, since I see you work for AMD and since merging this might further AMD's interests... would AMD maybe be willing to fund testing resources for this project? Maybe that's something you and @shiyu1994 (the only maintainer here who's employed by Microsoft) could coordinate?

jameslamb avatar Sep 08 '23 20:09 jameslamb

Are you aware of any free CI service supporting AMD GPUs? Otherwise, since I see you work for AMD and since merging this might further AMD's interests... would AMD maybe be willing to fund testing resources for this project? Maybe that's something you and @shiyu1994 (the only maintainer here who's employed by Microsoft) could coordinate?

Microsoft does have an AMD GPU deployment. I'm aware of it being used for onnxruntime CI purposes. I wonder if some of those resources could be used here? @shiyu1994?

jeffdaily avatar Sep 08 '23 20:09 jeffdaily

Noting that the only CI failure currently is not related to my changes. It seems to be a perhaps temporary environment setup issue for that job.

jeffdaily avatar Sep 08 '23 22:09 jeffdaily

I have access to some AMD MI100 GPUs. But we still need separate budget for an agent with an AMD GPU if we want to test automatically in ci. Do you think it is acceptable if I run the tests for AMD GPU offline without an additional agent for ci? Given that the code for GPU version is shared by both CUDA and ROCm. @jameslamb @guolinke @jeffdaily.

shiyu1994 avatar Sep 13 '23 03:09 shiyu1994

Do you think it is acceptable if I run the tests for AMD GPU offline without an additional agent for ci?

If you feel confident in these changes based on that, and you think the added complexity in the CUDA code is worth it, that's fine with me. I'll defer to your opinion.

But without a CI job, there's a high risk that future refactorings will break this support again.

jameslamb avatar Sep 13 '23 14:09 jameslamb

I dismissed my review, so that it doesn't block merging. My initial questions have been answered, thanks very much for those links and all that information!

@shiyu1994 and @guolinke seem excited about this addition... that's good enough for me 😊

I'll defer to them to review the code, as I know very little about CUDA.

jameslamb avatar Sep 13 '23 14:09 jameslamb

@jeffdaily Thanks for the great work! I'll review this in the next few days.

shiyu1994 avatar Oct 08 '23 15:10 shiyu1994

Thanks again for the contribution. I just got a Windows server with AMD MI25 GPU. I'm trying to use that server as a CI agent. Hopefully it won't be difficult.

shiyu1994 avatar Dec 01 '23 15:12 shiyu1994