[Issue]: Crash while compiling rocSPARSE
Problem Description
rocSPARSE compilation crashes, rather than producing an error or succeeding. fail.txt
Operating System
Arch linux, kernel 6.9.7-arch1-1
CPU
AMD Threadripper 1950X
GPU
AMD Radeon RX 7900 XTX
ROCm Version
ROCm 6.1.0
ROCm Component
rocSPARSE
Steps to Reproduce
After compiling all prerequisites, try doing the following (or something like it):
cd $BASEDIR
[[ -n "${BASEDIR}" ]] && rm -rf "$BASEDIR/14_sparse"
mkdir -p 14_sparse
cd 14_sparse
mkdir -p build
DEST="$BASEDIR/14_sparse/build"
git clone https://github.com/ROCmSoftwarePlatform/rocSPARSE
cd rocSPARSE
cmake \
-Wno-dev \
-D CMAKE_BUILD_TYPE=Release \
-D CMAKE_CXX_COMPILER=${ROCM_INSTALL_DIR}/bin/hipcc \
-D CMAKE_CXX_FLAGS="${CXXFLAGS} -fcf-protection=none" \
-D CMAKE_INSTALL_PREFIX=${ROCM_INSTALL_DIR} \
-G Ninja \
$BASEDIR/14_sparse/rocSPARSE
"${NINJA:=ninja}" $NUMJOBS
DESTDIR=$DEST "$NINJA" install
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
Thanks for raising this issue.
I have a couple questions that might help track down the problem:
- It looks like you are cloning rocSPARSE and using latest develop branch. Can you tell me what specific commit id you are using?
- From the log, it looks like you are hitting an issue when trying to compile a rocSPARSE routine that uses rocPRIM. I assume that you are using the rocPRIM that came with 6.1 and not cloning rocPRIM and installing rocPRIM from latest source before compiling rocSPARSE. Is that correct?
Issue appears to be caused by using rocprim from 6.1 but compiling latest rocsparse. The specific offending commit id in rocsparse is this: 81e4f9527b825195f53c8e3b660f6a699af829b7. Investigating a solution now. As a temporary workaround, compiling latest rocprim first and then compiling latest rocsparse should work.
A dependency diagram would be helpful... I don't know what depends on what and there's 40 of these repositories before I arrive at my goal.
But no, I am installing all of it through cloning and compiling, because I want to see if I can debug problems. However, due to how long it takes, there might be patches/new versions released during the compilation process. (As it takes several days to compile it all, and I'm still debugging the whole process, writing some patches to overcome ubuntu-assumptions, and so on...)
Fixing PR up now. Ill comment here once it is merged.
Regarding dependencies, currently rocSPARSE depends on rocPRIM and (optionally) rocBLAS. While the rocPRIM dependency is mentioned in the docs (see https://rocm.docs.amd.com/projects/rocSPARSE/en/latest/install/Linux_Install_Guide.html#linux-install), I agree we should present this information better as currently I don't think we are clear on how rocSPARSE should work when say using the latest rocSPARSE while also using older versions of rocPRIM (within the same major version). Ill look into improving that.
Correcting something I said in my previous comment that is wrong:
I identified the cause of the compilation failures you are seeing as stemming from using rocPRIM 3.1.0 (this is the version that came with your installation of rocm 6.1) and trying to compile rocSPARSE using the latest develop branch. Specifically there was a change in rocPRIM 3.2.0 that is used by rocSPARSE develop (the develop branch being much further ahead of what was packaged with your rocm 6.1 installation). This then caused compilation failures when using rocPRIM 3.1.0 since this version of rocPRIM obviously does not have those changes. All of this is correct.
The part where I made an incorrect statement was regarding how rocSPARSE should work with older versions of rocPRIM. How it works is actually the opposite of what I stated. Given a rocm release with say rocPRIM version 3.1.0 and rocSPARSE version 3.1.0, it should be possible to re-build rocSPARSE 3.1.0 with any future rocPRIM 3.Y.Z version where 3.Y.Z >= 3.1.0 up to the next major version change.
This then explains the failure as trying to build rocSPARSE 3.2.0 with rocPRIM 3.1.0 is not supported.
Recommendations: Some good rules to follow I think are as follows:
-
If you are trying to build rocSPARSE for the purpose of active development (you plan to create PR's into develop branch adding new functionality etc), you will want to clone and build rocPRIM with latest develop branch prior to building rocSPARSE on latest develop.
-
If you are building rocSPARSE from source to, say, just use a different architecture that is not included by default, it may be better to instead clone and build one of the release branches (for example release/rocm-rel-6.2) instead of develop as these branches are more stable. Just like before though, you will want to first clone and build rocPRIM with release/rocm-rel-6.2 followed by rocSPARSE with release/rocm-rel-6.2.
Given a rocm release with say rocPRIM version 3.1.0 and rocSPARSE version 3.1.0, it should be possible to re-build rocSPARSE 3.1.0 with any future rocPRIM 3.Y.Z version where 3.Y.Z >= 3.1.0 up to the next major version change.
Hi @AphidGit, were you able to compile rocSPARSE successfully after compiling an appropriate version of rocPRIM as @jsandham suggested above? Note that the rocSPARSE docs have been updated explaining rocSPARSE's dependency on rocPRIM better and should show up in an upcoming release. If you have any follow-up questions/concerns, let us know otherwise we can close this issue.
@AphidGit I'm closing this issue due to inactivity but if you're still facing this problem building rocSPARSE, feel free to re-open this issue and we can look into it.