extension-cpp test fails after successful "pip install ." command

I have a GPU:

nvidia-smi
| NVIDIA-SMI 535.86.10              Driver Version: 535.86.10    CUDA Version: 12.2     |
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA L40                     On  | 00000000:04:00.0 Off |                    0 |
| N/A   29C    P8              33W / 300W |      4MiB / 46068MiB |      0%      Default |
|                                         |                      |                  N/A |
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |

I have executed:

git clone <extension...>
cd extension_cpp
pip install .
python test/test_extension.py

and I get all 8 tests failing:

Fail to import hypothesis in common_utils, tests are not derandomized
EEEEEEEE
======================================================================
ERROR: test_opcheck_cpu (__main__.TestMyAddOut)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/optests/generate_tests.py", line 660, in opcheck
    tester(op, args, kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/optests/generate_tests.py", line 60, in safe_schema_check
    result = op(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/_ops.py", line 667, in __call__
    return self_._op(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/_subclasses/schema_check_mode.py", line 156, in __torch_dispatch__
    out = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/_ops.py", line 667, in __call__
    return self_._op(*args, **kwargs)
NotImplementedError: Could not run 'extension_cpp::myadd_out' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'extension_cpp::myadd_out' is only available for these backends: [HIP, Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastXPU, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

HIP: registered at extension_cpp/csrc/cuda/muladd.cu:82 [kernel]
Meta: registered at ../aten/src/ATen/core/MetaFallbackKernel.cpp:23 [backend fallback]
BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:153 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at ../aten/src/ATen/functorch/DynamicLayer.cpp:497 [backend fallback]
Functionalize: registered at ../aten/src/ATen/FunctionalizeFallbackKernel.cpp:349 [backend fallback]
Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at ../aten/src/ATen/ConjugateFallback.cpp:17 [backend fallback]
Negative: registered at ../aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback]
ZeroTensor: registered at ../aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:86 [backend fallback]
AutogradOther: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:53 [backend fallback]
AutogradCPU: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:57 [backend fallback]
AutogradCUDA: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:65 [backend fallback]
AutogradXLA: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:69 [backend fallback]
AutogradMPS: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:77 [backend fallback]
AutogradXPU: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:61 [backend fallback]
AutogradHPU: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:90 [backend fallback]
AutogradLazy: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:73 [backend fallback]
AutogradMeta: registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:81 [backend fallback]
Tracer: registered at ../torch/csrc/autograd/TraceTypeManual.cpp:297 [backend fallback]
AutocastCPU: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:209 [backend fallback]
AutocastXPU: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:351 [backend fallback]
AutocastCUDA: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:165 [backend fallback]
FuncTorchBatched: registered at ../aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:731 [backend fallback]
BatchedNestedTensor: registered at ../aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:758 [backend fallback]
FuncTorchVmapMode: fallthrough registered at ../aten/src/ATen/functorch/VmapModeRegistrations.cpp:27 [backend fallback]
Batched: registered at ../aten/src/ATen/LegacyBatchingRegistrations.cpp:1075 [backend fallback]
VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at ../aten/src/ATen/functorch/TensorWrapper.cpp:207 [backend fallback]
PythonTLSSnapshot: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:161 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at ../aten/src/ATen/functorch/DynamicLayer.cpp:493 [backend fallback]
PreDispatch: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:165 [backend fallback]
PythonDispatcher: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:157 [backend fallback]


More errors...

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

----------------------------------------------------------------------
Ran 8 tests in 0.161s

FAILED (errors=8)

Oct 21 '24 07:10 gonenraveh

I have the same problem

Nov 15 '24 03:11 yizhang2077

I have the same problem

I also have same problem on windows, but on ubuntu it's working

Nov 22 '24 21:11 Space4444

Hey did anyone find a solution to this?

Nov 28 '24 20:11 chirayu5

I have solved the question 'ERROR: test_opcheck_cpu (main.TestMyAddOut)' by creating a new environment, and upgrading the NVIDIA driver to the newest version, as well as using PyTorch 2.5.0 with CUDA 12.4 and Python 3.12. but I get a new question: return self._op(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ NotImplementedError: Could not run 'extension_cpp::myadd_out' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process ( if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'extension_cpp::myadd_out' is only available for these backends: [CPU, Meta, Back endSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLaz y, AutogradMeta, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

Dec 14 '24 07:12 jia-heng

I have solved the question 'ERROR: test_opcheck_cpu (main.TestMyAddOut)' by creating a new environment, and upgrading the NVIDIA driver to the newest version, as well as using PyTorch 2.5.0 with CUDA 12.4 and Python 3.12. but I get a new question: return self._op(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ NotImplementedError: Could not run 'extension_cpp::myadd_out' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process ( if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'extension_cpp::myadd_out' is only available for these backends: [CPU, Meta, Back endSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLaz y, AutogradMeta, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

my gpu is 3090ti

Dec 14 '24 07:12 jia-heng

@jia-heng I also ran into this issue, and in my case it was due to pytorch not being able to automatically find where cuda is installed. In setup.py you can see that if the variable CUDA_HOME from torch.utils.cpp_extension is None, then cuda is not used. This may happen if your cuda is installed in an unusual location, as in my case. You can work around it by directly setting the environmental variable CUDA_HOME to the location of your cuda install.

Dec 30 '24 00:12 hchau630

For me, I have some problems with the version torch 2.4.0 build from the source, so I decide to upgrade the version by re-installing libtorch and the latest version pytorch 2.5.1. that's solved the problem.

Jan 20 '25 04:01 rizqisubeno

I had the issue with torch 2.5.0 and tried it with different cuda versions where none worked but torch 2.4.1+cu124 with Cuda 12.4.0 built from source works on Ubuntu 24.04.

Feb 04 '25 10:02 vrushank-agrawal