torch-ccl icon indicating copy to clipboard operation
torch-ccl copied to clipboard

Compile error on conda environment torch 1.8.1v , gcc 9.3.1 , python 3.7

Open tiashlee opened this issue 4 years ago • 4 comments

python setup.py install throws an error

Building torch-ccl-1.2.0+8786e24 running install running bdist_egg running egg_info writing torch_ccl.egg-info/PKG-INFO writing dependency_links to torch_ccl.egg-info/dependency_links.txt writing top-level names to torch_ccl.egg-info/top_level.txt reading manifest file 'torch_ccl.egg-info/SOURCES.txt' writing manifest file 'torch_ccl.egg-info/SOURCES.txt' installing library code to build/bdist.linux-x86_64/egg running install_lib running build_py copying torch_ccl/version.py -> build/lib.linux-x86_64-3.7/torch_ccl running build_ext error: patch failed: third_party/oneCCL/CMakeLists.txt:239 error: third_party/oneCCL/CMakeLists.txt: patch does not apply error: patch failed: third_party/oneCCL/src/CMakeLists.txt:253 error: third_party/oneCCL/src/CMakeLists.txt: patch does not apply CMake Error at CMakeLists.txt:10 (find_package): By not providing "FindTorch.cmake" in CMAKE_MODULE_PATH this project has asked CMake to find a package configuration file provided by "Torch", but CMake did not find one.

Could not find a package configuration file provided by "Torch" with any of the following names:

TorchConfig.cmake
torch-config.cmake

Add the installation prefix of "Torch" to CMAKE_PREFIX_PATH or set "Torch_DIR" to a directory containing one of the above files. If "Torch" provides a separate development package or SDK, be sure it has been installed.

-- Configuring incomplete, errors occurred! See also "/ec/pdx/disks/mlp_lab_home_pool_02/ashleeti/torch-ccl/build/temp.linux-x86_64-3.7.libtorch_ccl/CMakeFiles/CMakeOutput.log". /nfs/site/home/ashleeti/anaconda3/envs/env/bin/cmake -DBUILD_CONFIG=OFF -DBUILD_EXAMPLES=OFF -DBUILD_FT=OFF -DBUILD_UT=OFF -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++ -DCMAKE_C_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/cc -DCMAKE_INSTALL_PREFIX=/ec/pdx/disks/mlp_lab_home_pool_02/ashleeti/torch-ccl/torch_ccl -DCMAKE_PREFIX_PATH=/nfs/site/home/ashleeti/anaconda3/envs/env -DPYTORCH_LIBRARY_DIRS=/nfs/site/home/ashleeti/anaconda3/envs/env/lib/python3.7/site-packages/torch/lib -DUSE_CUDA=0 /ec/pdx/disks/mlp_lab_home_pool_02/ashleeti/torch-ccl Traceback (most recent call last): File "setup.py", line 235, in 'clean': Clean, File "/nfs/site/home/ashleeti/anaconda3/envs/env/lib/python3.7/site-packages/setuptools/init.py", line 153, in setup return distutils.core.setup(**attrs) File "/nfs/site/home/ashleeti/anaconda3/envs/env/lib/python3.7/distutils/core.py", line 148, in setup dist.run_commands() File "/nfs/site/home/ashleeti/anaconda3/envs/env/lib/python3.7/distutils/dist.py", line 966, in run_commands self.run_command(cmd) File "/nfs/site/home/ashleeti/anaconda3/envs/env/lib/python3.7/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/nfs/site/home/ashleeti/anaconda3/envs/env/lib/python3.7/site-packages/setuptools/command/install.py", line 67, in run self.do_egg_install() File "/nfs/site/home/ashleeti/anaconda3/envs/env/lib/python3.7/site-packages/setuptools/command/install.py", line 109, in do_egg_install self.run_command('bdist_egg') File "/nfs/site/home/ashleeti/anaconda3/envs/env/lib/python3.7/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/nfs/site/home/ashleeti/anaconda3/envs/env/lib/python3.7/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/nfs/site/home/ashleeti/anaconda3/envs/env/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", line 164, in run cmd = self.call_command('install_lib', warn_dir=0) File "/nfs/site/home/ashleeti/anaconda3/envs/env/lib/python3.7/site-packages/setuptools/command/bdist_egg.py", line 150, in call_command self.run_command(cmdname) File "/nfs/site/home/ashleeti/anaconda3/envs/env/lib/python3.7/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/nfs/site/home/ashleeti/anaconda3/envs/env/lib/python3.7/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/nfs/site/home/ashleeti/anaconda3/envs/env/lib/python3.7/site-packages/setuptools/command/install_lib.py", line 11, in run self.build() File "/nfs/site/home/ashleeti/anaconda3/envs/env/lib/python3.7/distutils/command/install_lib.py", line 107, in build self.run_command('build_ext') File "/nfs/site/home/ashleeti/anaconda3/envs/env/lib/python3.7/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/nfs/site/home/ashleeti/anaconda3/envs/env/lib/python3.7/distutils/dist.py", line 985, in run_command cmd_obj.run() File "setup.py", line 81, in run self.build_cmake(ext) File "setup.py", line 126, in build_cmake extension.generate(build_options, my_env, build_dir, install_dir) File "/ec/pdx/disks/mlp_lab_home_pool_02/ashleeti/torch-ccl/tools/setup/cmake.py", line 224, in generate self._run(cmake_args, env=env) File "/ec/pdx/disks/mlp_lab_home_pool_02/ashleeti/torch-ccl/tools/setup/cmake.py", line 188, in _run check_call(command, cwd=self.build_dir, env=env) File "/nfs/site/home/ashleeti/anaconda3/envs/env/lib/python3.7/subprocess.py", line 363, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/nfs/site/home/ashleeti/anaconda3/envs/env/bin/cmake', '-DBUILD_CONFIG=OFF', '-DBUILD_EXAMPLES=OFF', '-DBUILD_FT=OFF', '-DBUILD_UT=OFF', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++', '-DCMAKE_C_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/cc', '-DCMAKE_INSTALL_PREFIX=/ec/pdx/disks/mlp_lab_home_pool_02/ashleeti/torch-ccl/torch_ccl', '-DCMAKE_PREFIX_PATH=/nfs/site/home/ashleeti/anaconda3/envs/env', '-DPYTORCH_LIBRARY_DIRS=/nfs/site/home/ashleeti/anaconda3/envs/env/lib/python3.7/site-packages/torch/lib', '-DUSE_CUDA=0', '/ec/pdx/disks/mlp_lab_home_pool_02/ashleeti/torch-ccl']' returned non-zero exit status 1.

tiashlee avatar Jul 15 '21 21:07 tiashlee

This is because the torch_ccl cannot locate the torch installation on your setup.

Can you try to install the torch explicitly and try again?

chengjunlu avatar Jul 19 '21 01:07 chengjunlu

This seems to be a common error since last few days. Not sure if there is change in latest conda CMake package that is causing this. For now exporting this env should solve the problem. export Torch_DIR=$(python -c "import torch; import os; print(os.path.dirname(torch.__file__) + '/share/cmake/Torch');")

ddkalamk avatar Jul 19 '21 02:07 ddkalamk

@ddkalamk Thanks for the information. I will try to check the install issue with the latest conda package

chengjunlu avatar Jul 19 '21 02:07 chengjunlu

Thanks @chengjunlu. To be more precise, these are the steps I used to setup conda env and install cmake. (most likely, other packages are irrelevant for this issue but I just kept those in case there is any dependency...)

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash ./Miniconda3-latest-Linux-x86_64.sh -b -p ./miniconda3
miniconda3/bin/conda create -y -n pt python=3.8
source miniconda3/bin/activate pt
conda install -y numpy ninja pyyaml mkl mkl-include setuptools cmake cffi jemalloc tqdm future pydot scikit-learn
conda install -y -c intel numpy
conda install -y -c eumetsat expect
conda install -y -c conda-forge gperftools onnx tensorboardx libunwind

ddkalamk avatar Jul 19 '21 02:07 ddkalamk