tensorflow cuDNN, cuFFT, and cuBLAS Errors

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

GIT_VERSION:v2.14.0-rc1-21-g4dacf3f368e VERSION:2.14.0

Custom code

No

OS platform and distribution

WSL2 Linux Ubuntu 22

Mobile device

No response

Python version

3.10, but I can try different versions

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

CUDA version: 11.8, cuDNN version: 8.7

GPU model and memory

NVIDIA Geforce GTX 1660 Ti, 8GB Memory

Current behavior?

When I run the GPU test from the TensorFlow install instructions, I get several errors and warnings. I don't care about the NUMA stuff, but the first 3 errors are that TensorFlow was not able to load cuDNN. I would really like to be able to use it to speed up training some RNNs and FFNNs. I do get my GPU in the list of physical devices, so I can still train, but not as fast as with cuDNN.

Standalone code to reproduce the issue

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Relevant log output

2023-10-09 13:36:23.355516: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-10-09 13:36:23.355674: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-10-09 13:36:23.355933: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-10-09 13:36:23.413225: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-09 13:36:25.872586: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-10-09 13:36:25.916952: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-10-09 13:36:25.917025: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Oct 09 '23 18:10 joshuacuellar1

Hi @Ke293-x2Ek-Qe-7-aE-B ,

Starting from TF2.14 tensorflow provides CUDA package which can install all the cuDNN,cuFFT and cubLas libraries.

You can use pip install tensorflow[and-cuda] command for that.

Please try this command let us know if it helps. Thankyou!

Oct 10 '23 09:10 SuryanarayanaY

@SuryanarayanaY I did not know that it now came bundled with cuDNN. I installed tensorflow with the [and-cuda] part, though, but I also installed cuda toolkit and cuDNN separately. I will try just installing the cuda toolkit and then installing tensorflow[and-cuda]. Also, is there a way to install tensorflow for GPU without it coming with cuDNN? If I just pip install tensorflow, will that install with GPU support, just without cuDNN, so that I can manually install them? I don't really need to, but I am curious if it can be installed that way too.

Oct 10 '23 13:10 joshuacuellar1

@SuryanarayanaY I tried several times, reinstalling Ubuntu, but it still doesn't work.

Oct 10 '23 23:10 joshuacuellar1

I also have the same issue, and this seems not to be due to cuda environment as I rebulid cuda and cudnn to make them suit for tf-2.14.0.

This is log out I find: python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

2023-10-11 18:21:57.387396: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2023-10-11 18:21:57.415774: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2023-10-11 18:21:57.415847: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2023-10-11 18:21:57.415877: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-10-11 18:21:57.421400: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-10-11 18:21:58.155058: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2023-10-11 18:21:59.113217: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:65:00.0/numa_node Your kernel may have been built without NUMA support. 2023-10-11 18:21:59.152044: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:65:00.0/numa_node Your kernel may have been built without NUMA support. 2023-10-11 18:21:59.152153: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:65:00.0/numa_node Your kernel may have been built without NUMA support. [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Oct 11 '23 10:10 AthiemoneZero

@AthiemoneZero Because it still does output a GPU device at the bottom of the log, I am training on GPU, just without cuDNN. It will be slower, but it is better than nothing or training on CPU.

Oct 11 '23 14:10 joshuacuellar1

@AthiemoneZero Because it still does output a GPU device at the bottom of the log, I am training on GPU, just without cuDNN. It will be slower, but it is better than nothing or training on CPU.

Yeah. But I just found that when I downgrade to 2.13.0 version, errors in register won't appear again. It looks like this:

(TF) ephys3@ZhouLab-Ephy3:~$ python3 -c "import tensorrt as trt;import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

2023-10-11 20:39:12.097457: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-10-11 20:39:12.130250: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-11 20:39:13.856721: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:65:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-10-11 20:39:13.870767: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:65:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-10-11 20:39:13.870941: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:65:00.0/numa_node
Your kernel may have been built without NUMA support.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Although I haven't figured out how to solve NUMA node error, I found some clues from another issue (as I operated all above in WSL Ubuntu). This bug seems not to be significant as explaination from NVIDIA forums . So I guess errors in register might have something with the latest version and errors in NUMA might be caused by OS enviroment. Hope this information would help some guys.

Oct 11 '23 14:10 AthiemoneZero

@AthiemoneZero I tried downgrading as well, but it didn't work for me. The NUMA errors are (as stated in the error message) because the kernel provided by Microsoft for WSL2 is not built with NUMA support. I tried cloning the repo (here) and building from source my own with NUMA support, but that didn't work, so I am just ignoring those errors for now.

Oct 11 '23 14:10 joshuacuellar1

@Ke293-x2Ek-Qe-7-aE-B I rebuilt all in an independent conda environment as TF. My steps were to create a TF env with python 3.9.8 and tried python3 -m pip install tensorflow[and-cuda] --user according to instruction. Following these I tried python3 -m pip install tensorflow[and-cuda]=2.13.0 --user and found it solved some bug.

Oct 11 '23 14:10 AthiemoneZero

@AthiemoneZero Thanks for the instructions. I'll try and see if it works on my system. I have been using python 3.10, so maybe that's why it didn't work. Did you have to install the CUDA toolkit?

Oct 11 '23 14:10 joshuacuellar1

@Ke293-x2Ek-Qe-7-aE-B I didnt execute conda install cuda-toolkit here. I guess [and-cuda] argument help me install some dependencies.

Oct 11 '23 14:10 AthiemoneZero

But I did double check version of cuda and cudnn. For this I even downgrade them again and again.

Oct 11 '23 14:10 AthiemoneZero

@AthiemoneZero Usually, I would install the CUDA toolkit according to these instructions (here), then install cuDNN according to these instructions (here). I installed CUDA toolkit version 11.8 and cuDNN version 8.7, because they are the latest supported by TensorFlow, according to their support table here. I guess using [and-cuda] installs all of that for you.

Oct 11 '23 14:10 joshuacuellar1

@Ke293-x2Ek-Qe-7-aE-B Apologize for my misunderstanding. I did the same in installing cuda toolkit as what you described above before I went directly to debug tf_gpu. I made sure my gpu and cuda could perform well as I have tried another task smoothly using cuda but without tf. What I concerned is some dependencies of tf have to be pre-installed in a conda env and this might be treated by [and-cuda] (my naive guess

Oct 11 '23 15:10 AthiemoneZero

@AthiemoneZero I always install CUDA toolkit and cuDNN globally for the whole system, and then install TensorFlow in a miniconda environment. This doesn't work anymore with the newest versions of TensorFlow, so I'll try your instructions. It does make sense to install everything in a conda env, I just hadn't thought of that since my other method had worked in the past. Thanks for sharing what you did to make it work.

Oct 11 '23 15:10 joshuacuellar1

@Ke293-x2Ek-Qe-7-aE-B You're welcomed. BTW, I also followed the instruction to configure development including suitable version of bazel and clang-16, just before all my operation digging into conda env.

Oct 11 '23 15:10 AthiemoneZero

@AthiemoneZero Thanks, but it didn't work.

Oct 11 '23 15:10 joshuacuellar1

Hello,

I'm experiencing the same issue, even though I meticulously followed all the instructions for setting up CUDA 11.8 and CuDNN 8.7. The error messages I'm encountering are as follows:

Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered. Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered. Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered.

I've tried this with different versions of Python. Surprisingly, when I used Python 3.11, TensorFlow 2.13 was installed without these errors. However, when I used Python 3.10 or 3.9, I ended up with TensorFlow 2.14 and the aforementioned errors.

I've come across information suggesting that I may not need to manually install CUDA and CuDNN, as [and-cuda] should handle the installation of these components automatically.

Could someone please guide me on the correct approach to resolve this issue? I've tried various methods, but unfortunately, none of them have yielded a working solution.

P.S. I'm using conda in WSL 2 on Windows 11.

Oct 17 '23 09:10 FaisalAlj

I am having the same issue as FaisalAlj above, on Windows 10 with the same versions of CUDA and CuDNN. The package tensorflow[and-cuda] is not found by pip. I've tried different versions of python and tensorflow without success. In my case I'm using virtualenv rather than conda.

Edit 1: I appear to be able to install tensorflow[and-cuda] as long as I use quotes around the package, like: pip install "tensorflow[and-cuda]".

Edit 2: I still appear to be getting these messages however, so I'm not sure I've installed things correctly.

Oct 17 '23 15:10 nkinnaird

Hi @Ke293-x2Ek-Qe-7-aE-B ,

I have checked the installation on colab(linx environment) and observed same logs as per attached gist.

These logs seems generated from XLA compiler but GPU is able to detectable. Similar issue #62002 and already bought to Engineering team attention.

CC: @learning-to-play

Oct 18 '23 09:10 SuryanarayanaY

@SuryanarayanaY After I did this: https://github.com/tensorflow/tensorflow/issues/62095#issuecomment-1763366758 Cuda, Gpu and Xla tests returned true and error was gone.

Oct 18 '23 20:10 Syndicateeee

@Syndicateeee

Thanks for the try, but did not really help.

I'm still having problems with downloading the latest Tensorflow 2.14

Oct 19 '23 13:10 FaisalAlj

@SuryanarayanaY I still have the same issue. For now, I am just going to install Ubuntu 22.04 instead of using WSL2.

Oct 19 '23 14:10 joshuacuellar1

@Ke293-x2Ek-Qe-7-aE-B go for Ubuntu 20.04 I tried 2 days to get it running on 22.04 didnt work.

Oct 19 '23 14:10 Syndicateeee

This procedure step made it work for me.

I tried several recipes to get a virtual environmnet (for GPU) built, and this is what ended up working for me (Ubuntu 10.04, cuda-toolkit 11.8, GPU CUDA version 12.0, python 10.13)

initiate virtual environment
user Tensorflow instructions to install Tensorflow as follows pip install tensorflow[and-cuda]
create instance on GPU following the instructions in this procedure as follows sudo nvidia-smi mig -cgi 0 -C

With that done, Tensorflow was able to recognize my GPU.

Oct 20 '23 10:10 adixxov

Hi, I'm running into the same issue as of today after installing tensorflow[and-cuda] (latest released version) From this thread, I'm not sure if people have found the root cause or a workaround. (for the record I dont have an MIG-enabled card)

Any advice?

Oct 27 '23 09:10 diervo

Same issue here on both WSL2 and Ubuntu.

>>> import tensorflow as tf
2023-10-28 13:19:49.412588: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-10-28 13:19:49.413083: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-10-28 13:19:49.437814: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

Oct 28 '23 02:10 BinyanHu

same error on native ubuntu 20.04, fresh new python environment

Oct 31 '23 23:10 back2yes

same error on native ubuntu 20.04 +1

Nov 02 '23 05:11 bluelancer

Running into a similar issue with a fresh conda environment.

2023-11-06 16:41:34.842775: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-06 16:41:34.863378: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-06 16:41:34.927139: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

Nov 06 '23 22:11 sarahmish

Same Error Me too on Ubuntu 22.04 with tensorflow version 2.14.0.

023-11-07 16:36:12.218005: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2023-11-07 16:36:12.236705: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2023-11-07 16:36:12.236735: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2023-11-07 16:36:12.236749: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-11-07 16:36:12.240263: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-11-07 16:36:13.173565: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2023-11-07 16:36:13.176158: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2023-11-07 16:36:13.176267: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2023-11-07 16:36:13.177887: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2023-11-07 16:36:13.177940: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2023-11-07 16:36:13.177975: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2023-11-07 16:36:13.226379: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2023-11-07 16:36:13.226458: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2023-11-07 16:36:13.226509: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 2023-11-07 16:36:13.226557: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9487 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4070, pci bus id: 0000:01:00.0, compute capability: 8.9 2023-11-07 16:36:13.450493: W tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:231] Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 8.9 2023-11-07 16:36:13.450510: W tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:234] Used ptxas at ptxas 2023-11-07 16:36:13.450538: W tensorflow/compiler/mlir/tools/kernel_gen/transforms/gpu_kernel_to_blob_pass.cc:191] Failed to compile generated PTX with ptxas. Falling back to compilation by driver. 2023-11-07 16:36:13.450549: W tensorflow/compiler/mlir/tools/kernel_gen/transforms/gpu_kernel_to_blob_pass.cc:191] Failed to compile generated PTX with ptxas. Falling back to compilation by driver. 2023-11-07 16:36:13.450561: W tensorflow/compiler/mlir/tools/kernel_gen/transforms/gpu_kernel_to_blob_pass.cc:191] Failed to compile generated PTX with ptxas. Falling back to compilation by driver. 2023-11-07 16:36:13.450575: W tensorflow/compiler/mlir/tools/kernel_gen/transforms/gpu_kernel_to_blob_pass.cc:191] Failed to compile generated PTX with ptxas. Falling back to compilation by driver. 2023-11-07 16:36:13.450586: W tensorflow/compiler/mlir/tools/kernel_gen/transforms/gpu_kernel_to_blob_pass.cc:191] Failed to compile generated PTX with ptxas. Falling back to compilation by driver. 2023-11-07 16:36:13.450595: W tensorflow/compiler/mlir/tools/kernel_gen/transforms/gpu_kernel_to_blob_pass.cc:191] Failed to compile generated PTX with ptxas. Falling back to compilation by driver. 2023-11-07 16:36:13.450611: W tensorflow/compiler/mlir/tools/kernel_gen/transforms/gpu_kernel_to_blob_pass.cc:191] Failed to compile generated PTX with ptxas. Falling back to compilation by driver. 2023-11-07 16:36:13.450658: W tensorflow/compiler/mlir/tools/kernel_gen/transforms/gpu_kernel_to_blob_pass.cc:191] Failed to compile generated PTX with ptxas. Falling back to compilation by driver. 2023-11-07 16:36:13.450711: W tensorflow/compiler/mlir/tools/kernel_gen/transforms/gpu_kernel_to_blob_pass.cc:191] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.

Nov 07 '23 07:11 JasOleander