mxnet icon indicating copy to clipboard operation
mxnet copied to clipboard

Problems building mxnet

Open kcperry opened this issue 8 years ago • 0 comments

For bugs or installation issues, please provide the following information. The more information you provide, the more likely people will be able to help you.

Environment info

Operating System: Ubuntu 16.04.3 LTS Compiler: hipcc / hcc (clang 6, see version output below)

hipcc --version HIP version: 1.4.17494 HCC clang version 6.0.0 (ssh://gerritgit/compute/ec/hcc-tot/clang 42ceed861a212d9bd0aef883ee7981144f3ecc02) (ssh://gerritgit/compute/ec/hcc-tot/llvm 23e086be6f627e6e983c6789d2e77da6bf85ebb6) (based on HCC 1.1.17493-2f85d8a-42ceed8-23e086b ) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /opt/rocm/hcc/bin

Package used (Python/R/Scala/Julia):

MXNet version:

Or if installed from source:

MXNet commit hash (git rev-parse HEAD): d053ae86d5327ca36315b9a0646989678fff335d If you are using python package, please provide

Python version and distribution:

If you are using R package, please provide

R sessionInfo():

Error Message:

Please paste the full error message, including stack trace. The initial issue was that with the latest rocm (1.7.60) install from the repositories there was problem with rocBLAS and hcRNG was missing so I built them from git. hcFFT was available as expected. At this point mxnet appear to compile but multiple errors reported I'm attached a build log from the second build attempt so it is less noisy.

I am also using cuda 9.1 but I did try cuda 8 which also failed. The environment vars in both cases were: LD_LIBRARY_PATH=/usr/local/cuda/lib64 (this symlinked to 8 or 9.1 depending on what is installed) HIP_PLATFORM=hcc

The current git version of mxnet also do not need the Makefile modification presented since it is always there.

build.log

Minimum reproducible example

if you are using your own code, please provide a short script that reproduces the error.

Steps to reproduce

or if you are running standard examples, please provide the commands you have run that lead to the error.

1.make -j $(nproc) 2. 3.

What have you tried to solve it?

The first stoppage in the log...

41 warnings and 2 errors generated. Died at /opt/rocm/bin/hipcc line 500

...refers to a line in the hipcc script...

495 if ($runCmd) { 496 if ($HIP_PLATFORM eq "hcc" and exists($hipConfig{'HCC_VERSION'}) and $HCC_VERSION ne $hipConfig{'HCC_VERSION'}) { 497 print ("HIP ($HIP_PATH) was built using hcc $hipConfig{'HCC_VERSION'}, but you are using $HCC_HOME/hcc with version $HCC_VERSION from hipcc. Please rebuild HIP including cmake or update HCC_HOME variable.\n") ; 498 die unless $ENV{'HIP_IGNORE_HCC_VERSION'}; 499 } 500 system ("$CMD") and die (); 501 }

However, my HIP configuration appears to be good... hipconfig HIP version : 1.4.17494

== hipconfig HIP_PATH : /opt/rocm HIP_PLATFORM : hcc CPP_CONFIG : -D__HIP_PLATFORM_HCC__= -I/opt/rocm/include -I/opt/rocm/hcc/include

== hcc HSA_PATH : /opt/rocm/hsa HCC_HOME : /opt/rocm/hcc HCC clang version 6.0.0 (ssh://gerritgit/compute/ec/hcc-tot/clang 42ceed861a212d9bd0aef883ee7981144f3ecc02) (ssh://gerritgit/compute/ec/hcc-tot/llvm 23e086be6f627e6e983c6789d2e77da6bf85ebb6) (based on HCC 1.1.17493-2f85d8a-42ceed8-23e086b ) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /opt/rocm/hcc/bin LLVM (http://llvm.org/): LLVM version 6.0.0svn Optimized build. Default target: x86_64-unknown-linux-gnu Host CPU: znver1

Registered Targets: amdgcn - AMD GCN GPUs r600 - AMD GPUs HD2XXX-HD6XXX x86 - 32-bit X86: Pentium-Pro and above x86-64 - 64-bit X86: EM64T and AMD64 HCC-cxxflags : -hc -std=c++amp -I/opt/rocm/hcc-1.0/include -I/opt/rocm/includeHCC-ldflags : -hc -std=c++amp -L/opt/rocm/hcc-1.0/lib -Wl,--rpath=/opt/rocm/hcc-1.0/lib -ldl -lm -lpthread -lunwind -lhc_am -Wl,--whole-archive -lmcwamp -Wl,--no-whole-archive

=== Environment Variables PATH=/opt/rocm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin LD_LIBRARY_PATH=/usr/local/cuda/lib64 HIP_PLATFORM=hcc

== Linux Kernel Hostname : Linux 4.4.0-109-generic #132-Ubuntu SMP Tue Jan 9 19:52:39 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 16.04.3 LTS Release: 16.04 Codename: xenial

~ ~ ~

I'm not sure what to try next. My guess is that there are some function differences between mxnet code and the larger requirements but I don't know how to resolve that.

kcperry avatar Jan 16 '18 04:01 kcperry