objectsdf_plus icon indicating copy to clipboard operation
objectsdf_plus copied to clipboard

Unable to Compile hash encoder

Open wongsinglam opened this issue 1 year ago • 12 comments

Hi,

I suffer from a very strange problem which is related to hash_encoder

Traceback (most recent call last): File "/home//projects/objectsdf_plus/code/training/exp_runner.py", line 62, in trainrunner = ObjectSDFPlusTrainRunner(conf=opt.conf, File "/home//projects/objectsdf_plus/code/../code/training/objectsdfplus_train.py", line 112, in init self.model = utils.get_class(self.conf.get_string('train.model_class'))(conf=conf_model) File "/home//projects/objectsdf_plus/code/../code/utils/general.py", line 17, in get_class m = import(module) File "/home//projects/objectsdf_plus/code/../code/model/network.py", line 172, in from hashencoder.hashgrid import HashEncoder File "/home//projects/objectsdf_plus/code/../code/hashencoder/init.py", line 1, in from .hashgrid import HashEncoder File "/home//projects/objectsdf_plus/code/../code/hashencoder/hashgrid.py", line 12, in from .backend import _backend File "/home//projects/objectsdf_plus/code/../code/hashencoder/backend.py", line 10, in _backend = load(name='_hash_encoder', File "/mnt/sfs_turbo//miniconda3/envs/objectpp/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load return _jit_compile( File "/mnt/sfs_turbo//miniconda3/envs/objectpp/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1509, in _jit_compile _write_ninja_file_and_build_library( File "/mnt/sfs_turbo//miniconda3/envs/objectpp/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1624, in _write_ninja_file_and_build_library _run_ninja_build( File "/mnt/sfs_turbo/*/miniconda3/envs/objectpp/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension '_hash_encoder'

Environment:

cuda toolkit 11.7.0 and 11.7.1 has been already tried. But I am using cuda-toolkit from the channel nvidia in conda. Not sure how it related to my problem. The repository objsdf works great in my machine!

wongsinglam avatar Nov 13 '24 08:11 wongsinglam

Hi,

Thanks for you interest in our work. I don't have any idea of the compile bug from this log. Would you mind providing more information of the log and the spec about the environment and system?

You can also refer to the issue of here to find a solution if possible.

QianyiWu avatar Nov 13 '24 18:11 QianyiWu

[3/3] c++ hashencoder.cuda.o bindings.o -shared -L/home//Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/home//Applications/miniconda3/envs/gsrec/lib64 -lcudart -o _hash_encoder.so
FAILED: _hash_encoder.so 
c++ hashencoder.cuda.o bindings.o -shared -L/home//Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/home//Applications/miniconda3/envs/gsrec/lib64 -lcudart -o _hash_encoder.so
/usr/bin/ld: cannot find -lcudart: No such file or directory
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/home//Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1814, in _run_ninja_build
    env=env)
  File "/home//Applications/miniconda3/envs/gsrec/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "train.py", line 29, in <module>
    from gaussian_renderer import prefilter_voxel, render, network_gui
  File "/home//projects/gsrec/gaussian_renderer/__init__.py", line 17, in <module>
    from scene.gaussian_model_implicit import GaussianModel
  File "/home//projects/gsrec/scene/__init__.py", line 17, in <module>
    from scene.gaussian_model_implicit import GaussianModel
  File "/home//projects/gsrec/scene/gaussian_model_implicit.py", line 47, in <module>
    from hashencoder.hashgrid import HashEncoder
  File "/home//projects/gsrec/hashencoder/__init__.py", line 1, in <module>
    from .hashgrid import HashEncoder
  File "/home//projects/gsrec/hashencoder/hashgrid.py", line 10, in <module>
    from .backend import _backend
  File "/home//projects/gsrec/hashencoder/backend.py", line 21, in <module>
    verbose=True,
  File "/home//Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1214, in load
    keep_intermediates=keep_intermediates)
  File "/home//Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1435, in _jit_compile
    is_standalone=is_standalone)
  File "/home//Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1540, in _write_ninja_file_and_build_library
    error_prefix=f"Error building extension '{name}'")
  File "/home//Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1824, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension '_hash_encoder'

Hi, Thank you for your reply. I meet the same problem with the repository gsrec.

Because I don't install cudatoolkit in my machine. I just install conda version cuda-toolkit from channel nvidia not cudatoolkit. It seems "-lcudart" cannot be found.

I am not sure if there any other solution for it without installing cuda in the machine. Thanks!!!

wongsinglam avatar Nov 26 '24 07:11 wongsinglam

Hi, report again.

When I tried to get cuda-toolkit 11.8 in the real machine. New problem here.

Detected CUDA files, patching ldflags
Emitting ninja build file ./tmp_build/build.ninja...
Building extension module _hash_encoder...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] c++ -MMD -MF bindings.o.d -DTORCH_EXTENSION_NAME=_hash_encoder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include -isystem /home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/TH -isystem /home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/wsl/Applications/miniconda3/envs/gsrec/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -c /home/wsl/projects/gsrec/hashencoder/src/bindings.cpp -o bindings.o 
[2/3] /usr/local/cuda/bin/nvcc  -DTORCH_EXTENSION_NAME=_hash_encoder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include -isystem /home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/TH -isystem /home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/wsl/Applications/miniconda3/envs/gsrec/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 -std=c++14 -allow-unsupported-compiler -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -c /home/wsl/projects/gsrec/hashencoder/src/hashencoder.cu -o hashencoder.cuda.o 
FAILED: hashencoder.cuda.o 
/usr/local/cuda/bin/nvcc  -DTORCH_EXTENSION_NAME=_hash_encoder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include -isystem /home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/TH -isystem /home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/wsl/Applications/miniconda3/envs/gsrec/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 -std=c++14 -allow-unsupported-compiler -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -c /home/wsl/projects/gsrec/hashencoder/src/hashencoder.cu -o hashencoder.cuda.o 
/usr/include/c++/12/bits/locale_facets_nonio.tcc: In member function ‘_InIter std::time_get<_CharT, _InIter>::get(iter_type, iter_type, std::ios_base&, std::ios_base::iostate&, tm*, const char_type*, const char_type*) const’:
/usr/include/c++/12/bits/locale_facets_nonio.tcc:1477:77: error: invalid type argument of unary ‘*’ (have ‘int’)
 1477 |       if ((void*)(this->*(&time_get::do_get)) == (void*)(&time_get::do_get))
      |                                                                             ^   
/home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/pybind11/cast.h: In function ‘typename pybind11::detail::type_caster<typename pybind11::detail::intrinsic_type<T>::type>::cast_op_type<T> pybind11::detail::cast_op(make_caster<T>&)’:
/home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/pybind11/cast.h:951:120: error: expected template-name before ‘<’ token
  951 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
      |                                                                                                                        ^
/home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/pybind11/cast.h:951:120: error: expected identifier before ‘<’ token
/home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/pybind11/cast.h:951:123: error: expected primary-expression before ‘>’ token
  951 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
      |                                                                                                                           ^
/home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/include/pybind11/cast.h:951:126: error: expected primary-expression before ‘)’ token
  951 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
      |                                                                                                                              ^
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1814, in _run_ninja_build
    env=env)
  File "/home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "train.py", line 29, in <module>
    from gaussian_renderer import prefilter_voxel, render, network_gui
  File "/home/wsl/projects/gsrec/gaussian_renderer/__init__.py", line 17, in <module>
    from scene.gaussian_model_implicit import GaussianModel
  File "/home/wsl/projects/gsrec/scene/__init__.py", line 17, in <module>
    from scene.gaussian_model_implicit import GaussianModel
  File "/home/wsl/projects/gsrec/scene/gaussian_model_implicit.py", line 47, in <module>
    from hashencoder.hashgrid import HashEncoder
  File "/home/wsl/projects/gsrec/hashencoder/__init__.py", line 1, in <module>
    from .hashgrid import HashEncoder
  File "/home/wsl/projects/gsrec/hashencoder/hashgrid.py", line 10, in <module>
    from .backend import _backend
  File "/home/wsl/projects/gsrec/hashencoder/backend.py", line 21, in <module>
    verbose=True,
  File "/home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1214, in load
    keep_intermediates=keep_intermediates)
  File "/home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1435, in _jit_compile
    is_standalone=is_standalone)
  File "/home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1540, in _write_ninja_file_and_build_library
    error_prefix=f"Error building extension '{name}'")
  File "/home/wsl/Applications/miniconda3/envs/gsrec/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1824, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension '_hash_encoder'

wongsinglam avatar Nov 27 '24 01:11 wongsinglam

Hi, would you mind providing more information about your OS, GPU and your own CUDA version?

QianyiWu avatar Nov 27 '24 01:11 QianyiWu

Ubuntu 2204, rtx 3090, NVIDIA-SMI 550.120, cuda-toolkit 11.8 (tried both real machine and conda virtual one from nvidia chanel).

Hope it helps

wongsinglam avatar Nov 27 '24 01:11 wongsinglam

And what is your pytorch version?

QianyiWu avatar Nov 27 '24 01:11 QianyiWu

I am using the pytorch version you provided in gsrec. And for objsdf_pp also with pytorch 2.0.0 you provided

wongsinglam avatar Nov 27 '24 01:11 wongsinglam

I was wondering what cuda-toolkit you are using. it would be easier for me to find out the problem as well.

wongsinglam avatar Nov 27 '24 01:11 wongsinglam

Hi, I think it may be related to the version of c/c++ I got different errors when I switch the version of c/c++. Could you please share your version of c/c++?

wongsinglam avatar Nov 27 '24 05:11 wongsinglam

I remember my GCC version was not higher than 11 in these projects.

QianyiWu avatar Nov 27 '24 05:11 QianyiWu

Hi,

I met cannot find -lcudart issue in other projects. And I solved it by export CUDA_HOME=/usr/local/cuda and recompile it. I noticed you faced this issue before. Hope it could help.

QianyiWu avatar Nov 28 '24 01:11 QianyiWu

Thank you for your reply. Yes, it is the issue related to cuda. Keep c version under 11 and install cuda toolkit in real machine and it solve my problem.

wongsinglam avatar Nov 28 '24 01:11 wongsinglam