cog icon indicating copy to clipboard operation
cog copied to clipboard

Cog keeps defaulting to hipcc, whereas i have nvidia GPUs

Open shivamgcodes opened this issue 10 months ago • 0 comments

so i am trying to rebuild an image from replicate, i aim to do some minor changes, but for now lets just ignore those. My workflow --> start the docker container, extract the src folder from the container, try to build this src with cog

My system has nvidia-smi, nvtop, nvcc installed, so that is not the issue. Link to the replicate image - https://replicate.com/wty-ustc/hairclip

This is the Traceback - this is the traceback -

Traceback (most recent call last):
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/cog/command/openapi_schema.py", line 45, in <module>
    raise CogError(app.state.setup_result.logs)
cog.errors.CogError: ['Error while loading predictor:\n\nTraceback (most recent call last):\n  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1717, in _run_ninja_build\n    subprocess.run(\n  File "/root/.pyenv/versions/3.8.20/lib/python3.8/subprocess.py", line 516, in run\n    raise CalledProcessError(retcode, process.args,\nsubprocess.CalledProcessError: Command \'[\'ninja\', \'-v\']\' returned non-zero exit status 1.\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/cog/server/http.py", line 159, in create_app\n    InputType, OutputType, is_async = cog_config.get_predictor_types(\n  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/cog/config.py", line 171, in get_predictor_types\n    predictor = self._load_predictor_for_types(\n  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/cog/config.py", line 147, in _load_predictor_for_types\n    module = load_full_predictor_from_file(module_path, module_name)\n  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/cog/predictor.py", line 151, in load_full_predictor_from_file\n    spec.loader.exec_module(module)\n  File "<frozen importlib._bootstrap_external>", line 843, in exec_module\n  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed\n  File "predict.py", line 14, in <module>\n    from models.psp import pSp\n  File "/src/encoder4editing/models/psp.py", line 6, in <module>\n    from models.encoders import psp_encoders\n  File "/src/encoder4editing/models/encoders/psp_encoders.py", line 9, in <module>\n    from models.stylegan2.model import EqualLinear\n  File "/src/encoder4editing/models/stylegan2/model.py", line 7, in <module>\n    from models.stylegan2.op import FusedLeakyReLU, fused_leaky_relu, upfirdn2d\n  File "/src/encoder4editing/models/stylegan2/op/__init__.py", line 1, in <module>\n    from .fused_act import FusedLeakyReLU, fused_leaky_relu\n  File "/src/encoder4editing/models/stylegan2/op/fused_act.py", line 9, in <module>\n    fused = load(\n  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1124, in load\n    return _jit_compile(\n  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1337, in _jit_compile\n    _write_ninja_file_and_build_library(\n  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1449, in _write_ninja_file_and_build_library\n    _run_ninja_build(\n  File "/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build\n    raise RuntimeError(message) from e\nRuntimeError: Error building extension \'fused\': [1/3] bin/hipcc  -DWITH_HIP -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\\"_gcc\\" -DPYBIND11_STDLIB=\\"_libstdcpp\\" -DPYBIND11_BUILD_ABI=\\"_cxxabi1011\\" -isystem /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include -isystem /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/TH -isystem /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/THC -isystem /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/THH -isystem include -isystem /root/.pyenv/versions/3.8.20/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -fPIC -D__HIP_PLATFORM_HCC__=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 --amdgpu-target=gfx803 --amdgpu-target=gfx900 --amdgpu-target=gfx906 --amdgpu-target=gfx908 -fno-gpu-rdc -c /src/encoder4editing/models/stylegan2/op/fused_bias_act_kernel.hip -o fused_bias_act_kernel.cuda.o \nFAILED: fused_bias_act_kernel.cuda.o \nbin/hipcc  -DWITH_HIP -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\\"_gcc\\" -DPYBIND11_STDLIB=\\"_libstdcpp\\" -DPYBIND11_BUILD_ABI=\\"_cxxabi1011\\" -isystem /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include -isystem /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/TH -isystem /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/THC -isystem /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/THH -isystem include -isystem /root/.pyenv/versions/3.8.20/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -fPIC -D__HIP_PLATFORM_HCC__=1 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 --amdgpu-target=gfx803 --amdgpu-target=gfx900 --amdgpu-target=gfx906 --amdgpu-target=gfx908 -fno-gpu-rdc -c /src/encoder4editing/models/stylegan2/op/fused_bias_act_kernel.hip -o fused_bias_act_kernel.cuda.o \n/bin/sh: 1: bin/hipcc: not found\n[2/3] c++ -MMD -MF fused_bias_act.o.d -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\\"_gcc\\" -DPYBIND11_STDLIB=\\"_libstdcpp\\" -DPYBIND11_BUILD_ABI=\\"_cxxabi1011\\" -isystem /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include -isystem /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/TH -isystem /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/THC -isystem /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/THH -isystem include -isystem /root/.pyenv/versions/3.8.20/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /src/encoder4editing/models/stylegan2/op/fused_bias_act.cpp -o fused_bias_act.o \nIn file included from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/ATen/ATen.h:7,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:8,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/extension.h:4,\n                 from /src/encoder4editing/models/stylegan2/op/fused_bias_act.cpp:1:\n/src/encoder4editing/models/stylegan2/op/fused_bias_act.cpp: In function ‘at::Tensor fused_bias_act(const at::Tensor&, const at::Tensor&, const at::Tensor&, int, int, float, float)’:\n/src/encoder4editing/models/stylegan2/op/fused_bias_act.cpp:7:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]\n    7 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")\n      |                                          ^\n/src/encoder4editing/models/stylegan2/op/fused_bias_act.cpp:13:5: note: in expansion of macro ‘CHECK_CUDA’\n   13 |     CHECK_CUDA(input);\n      |     ^~~~~~~~~~\nIn file included from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/ATen/ATen.h:9,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:8,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/extension.h:4,\n                 from /src/encoder4editing/models/stylegan2/op/fused_bias_act.cpp:1:\n/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:194:30: note: declared here\n  194 |   DeprecatedTypeProperties & type() const {\n      |                              ^~~~\nIn file included from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/ATen/ATen.h:7,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:8,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/extension.h:4,\n                 from /src/encoder4editing/models/stylegan2/op/fused_bias_act.cpp:1:\n/src/encoder4editing/models/stylegan2/op/fused_bias_act.cpp:7:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]\n    7 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")\n      |                                          ^\n/src/encoder4editing/models/stylegan2/op/fused_bias_act.cpp:14:5: note: in expansion of macro ‘CHECK_CUDA’\n   14 |     CHECK_CUDA(bias);\n      |     ^~~~~~~~~~\nIn file included from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/ATen/ATen.h:9,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:8,\n                 from /root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/torch/extension.h:4,\n                 from /src/encoder4editing/models/stylegan2/op/fused_bias_act.cpp:1:\n/root/.pyenv/versions/3.8.20/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:194:30: note: declared here\n  194 |   DeprecatedTypeProperties & type() const {\n      |                              ^~~~\nninja: build stopped: subcommand failed.\n\n']

ⅹ Failed to get type signature: exit status 1

shivamgcodes avatar Apr 02 '25 19:04 shivamgcodes