re-organize rocm header for the upcomming rocm update
Deploy Preview for pytorch-fbgemm-docs canceled.
| Name | Link |
|---|---|
| Latest commit | 5719c3e81222160eb3a8b7ca38961905c9fdf458 |
| Latest deploy log | https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/654c4953fa9378000835ad06 |
No ciflow labels are configured for this repo. For information on how to enable CIFlow bot see this wiki
Thanks for the contributions! It'd be nice if you could add explanation a little bit :)
Judging from the changes, ROCm 5.7 has headers at both old and new places, while ROCm 6.0 will deprecate those places. Is my understanding correct?
#include <hipblas.h> // old
#include <hipblas/hipblas.h> // new?
#include <rocm_version.h> // old
#include <rocm-core/rocm_version.h> // new?
Thanks for the contributions! It'd be nice if you could add explanation a little bit :)
Judging from the changes, ROCm 5.7 has headers at both old and new places, while ROCm 6.0 will deprecate those places. Is my understanding correct?
#include <hipblas.h> // old #include <hipblas/hipblas.h> // new? #include <rocm_version.h> // old #include <rocm-core/rocm_version.h> // new?
Sorry, I didn't explain the changes. Your reasoning is correct. ROCm5.7 is a transition from the old places to the new ones, and ROCm6.0 will deprecate the old header.
Thanks! It looks great to me. Let me import it and merge it. (The linter issue is not coming from this PR, so don't worry about it.)
Thanks! It looks great to me. Let me import it and merge it. (The linter issue is not coming from this PR, so don't worry about it.)
Thanks a lot for your help!
@shintaro-iwasaki has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
Hmm, 5.6 test is failing: Build x86 Linux Wheels / pytorch/FBGEMM / manywheel-py3_8-rocm5_6
/__w/FBGEMM/FBGEMM/pytorch/FBGEMM/fbgemm_gpu/src/split_embeddings_utils.hip:27:10: fatal error: 'rocm-core/rocm_version.h' file not found
#include <rocm-core/rocm_version.h>
@q10 Do you know if we still need to keep this job active?
Hmm, 5.6 test is failing:
Build x86 Linux Wheels / pytorch/FBGEMM / manywheel-py3_8-rocm5_6/__w/FBGEMM/FBGEMM/pytorch/FBGEMM/fbgemm_gpu/src/split_embeddings_utils.hip:27:10: fatal error: 'rocm-core/rocm_version.h' file not found #include <rocm-core/rocm_version.h>@q10 Do you know if we still need to keep this job active?
ROCm5.6 is expected to fail. I suggest moving to ROCm5.7; otherwise, there would be more divergences when enabling the UVM tests.
Hmm, 5.6 test is failing:
Build x86 Linux Wheels / pytorch/FBGEMM / manywheel-py3_8-rocm5_6/__w/FBGEMM/FBGEMM/pytorch/FBGEMM/fbgemm_gpu/src/split_embeddings_utils.hip:27:10: fatal error: 'rocm-core/rocm_version.h' file not found #include <rocm-core/rocm_version.h>@q10 Do you know if we still need to keep this job active?
We can disable it, since the Nova builds for ROCm are broken anyway - the wheel can be built but installing the wheel and running tests will result in segfaults (see example run here). cc @liligwu
Hmm, 5.6 test is failing:
Build x86 Linux Wheels / pytorch/FBGEMM / manywheel-py3_8-rocm5_6/__w/FBGEMM/FBGEMM/pytorch/FBGEMM/fbgemm_gpu/src/split_embeddings_utils.hip:27:10: fatal error: 'rocm-core/rocm_version.h' file not found #include <rocm-core/rocm_version.h>@q10 Do you know if we still need to keep this job active?
We can disable it, since the Nova builds for ROCm are broken anyway - the wheel can be built but installing the wheel and running tests will result in segfaults (see example run here). cc @liligwu
Hi @q10 . Yes, the unary test is an old issue. I saw it was skipped in the test https://github.com/pytorch/FBGEMM/blob/bd2d5fcb953dc64e3d07102de363e2c76fe25fbd/.github/scripts/fbgemm_gpu_test.bash#L94 We will take a look at it if you are concerned about the failure.
@q10 Do you know how to disable it? Is it something @liligwu can just remove 5.6 from one of the GHA yml files? It'd be good to disable this test before merging this PR.
@q10 Do you know how to disable it? Is it something @liligwu can just remove 5.6 from one of the GHA yml files? It'd be good to disable this test before merging this PR.
This line will need to be updated to disable. Unfortunately, ROCm versions in Nova runs are decided by PyTorch infra, not us, so we aren't able to say "keep 5.7 and disable 5.6".
Thanks! I see. If that's the case, can we keep 5.6 by adding #ifdef @liligwu until the PyTorch infra disables 5.6 (@liligwu)? We don't want to make the test red while we want to keep the ROCm test if possible.
We will need some versioning based on the ROCm version for a while, but hope it is not too burdensome.
Thanks! I see. If that's the case, can we keep
5.6by adding#ifdef@liligwu until the PyTorch infra disables 5.6 (@liligwu)? We don't want to make the test red while we want to keep the ROCm test if possible.We will need some versioning based on the ROCm version for a while, but hope it is not too burdensome.
Thanks @q10 for the pointer. I'll figure out how to keep rocm5.6 for the header changes. Please hold the merge @shintaro-iwasaki
But I don't have a good idea that switches UVM test enablement between 5.6 and 5.7 currently. I'll let you know if I can find an approach.
CUDA has torch.version.cuda. Maybe torch.version.hip returns something useful?
CUDA has
torch.version.cuda. Maybetorch.version.hipreturns something useful?
Sounds good. I'll try it out. Thanks.