topaz icon indicating copy to clipboard operation
topaz copied to clipboard

CudaWarning: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Open pconesa opened this issue 3 years ago • 1 comments

Hi, in some machines our installation (scipion-topaz) work fine but in our test server topaz is not finding the GPUs.

Topaz stderr output is:

CudaWarning: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
Falling back to CPU.
# using device=0 with cuda=False
# Loading model: unet
# 1 of 10 completed.
# 2 of 10 completed.
# 3 of 10 completed.
# 4 of 10 completed.
# 5 of 10 completed.
# 6 of 10 completed.
# 7 of 10 completed.
# 8 of 10 completed.
# 9 of 10 completed.
# 10 of 10 completed.
CudaWarning: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
Falling back to CPU.
CudaWarning: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
Falling back to CPU.

nvisia-smi is:

buildbot@scipionbox:~$ nvidia-smi 
Fri Sep  9 16:47:05 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03   Driver Version: 470.141.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:03:00.0 Off |                  N/A |
| 27%   38C    P8     7W / 151W |      6MiB /  8119MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:81:00.0 Off |                  N/A |
| 34%   35C    P8     6W / 151W |      6MiB /  8119MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Quadro M4000        Off  | 00000000:82:00.0 Off |                  N/A |
| 46%   37C    P8    11W / 120W |     24MiB /  8125MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      3783      G   /usr/lib/xorg/Xorg                  4MiB |
|    1   N/A  N/A      3783      G   /usr/lib/xorg/Xorg                  4MiB |
|    2   N/A  N/A      3783      G   /usr/lib/xorg/Xorg                 21MiB |
+-----------------------------------------------------------------------------+

environment info is

(topaz-0.2.5) buildbot@scipionbox:~$ conda list
# packages in environment at /home/buildbot/anaconda3/envs/topaz-0.2.5:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
blas                      1.0                         mkl  
bzip2                     1.0.8                h7b6447c_0  
ca-certificates           2022.07.19           h06a4308_0  
certifi                   2021.5.30        py36h06a4308_0  
cudatoolkit               11.3.1               h2bc3f7f_2  
dataclasses               0.8                pyh4f3eec9_6  
ffmpeg                    4.3                  hf484d3e_0    pytorch
freetype                  2.11.0               h70c0345_0  
future                    0.18.2                   py36_1  
gmp                       6.2.1                h295c915_3  
gnutls                    3.6.15               he1e5248_0  
intel-openmp              2022.1.0          h9e868ea_3769  
joblib                    1.0.1              pyhd3eb1b0_0  
jpeg                      9e                   h7f8727e_0  
lame                      3.100                h7b6447c_0  
lcms2                     2.12                 h3be6417_0  
ld_impl_linux-64          2.38                 h1181459_1  
lerc                      3.0                  h295c915_0  
libdeflate                1.8                  h7f8727e_5  
libffi                    3.3                  he6710b0_2  
libgcc-ng                 11.2.0               h1234567_1  
libgfortran-ng            7.5.0               ha8ba4b0_17  
libgfortran4              7.5.0               ha8ba4b0_17  
libgomp                   11.2.0               h1234567_1  
libiconv                  1.16                 h7f8727e_2  
libidn2                   2.3.2                h7f8727e_0  
libpng                    1.6.37               hbc83047_0  
libstdcxx-ng              11.2.0               h1234567_1  
libtasn1                  4.16.0               h27cfd23_0  
libtiff                   4.4.0                hecacb30_0  
libunistring              0.9.10               h27cfd23_0  
libuv                     1.40.0               h7b6447c_0  
libwebp-base              1.2.2                h7f8727e_0  
lz4-c                     1.9.3                h295c915_1  
mkl                       2020.2                      256  
mkl-service               2.3.0            py36he8ac12f_0  
mkl_fft                   1.3.0            py36h54f3939_0  
mkl_random                1.1.1            py36h0573a6f_0  
ncurses                   6.3                  h5eee18b_3  
nettle                    3.7.3                hbbd107a_1  
numpy                     1.19.2           py36h54aff64_0  
numpy-base                1.19.2           py36hfa32c7d_0  
olefile                   0.46                     py36_0  
openh264                  2.1.1                h4ff587b_0  
openjpeg                  2.4.0                h3ad879b_0  
openssl                   1.1.1q               h7f8727e_0  
pandas                    1.1.5            py36ha9443f7_0  
pillow                    8.3.1            py36h2c7a002_0  
pip                       21.2.2           py36h06a4308_0  
python                    3.6.13               h12debd9_1  
python-dateutil           2.8.2              pyhd3eb1b0_0  
pytorch                   1.10.2          py3.6_cuda11.3_cudnn8.2.0_0    pytorch
pytorch-mutex             1.0                        cuda    pytorch
pytz                      2021.3             pyhd3eb1b0_0  
readline                  8.1.2                h7f8727e_1  
scikit-learn              0.24.2           py36ha9443f7_0  
scipy                     1.5.2            py36h0b6359f_0  
setuptools                58.0.4           py36h06a4308_0  
six                       1.16.0             pyhd3eb1b0_1  
sqlite                    3.39.2               h5082296_0  
threadpoolctl             2.2.0              pyh0d69192_0  
tk                        8.6.12               h1ccaba5_0  
topaz                     0.2.5                      py_0    tbepler
torchvision               0.11.3               py36_cu113    pytorch
typing_extensions         4.1.1              pyh06a4308_0  
wheel                     0.37.1             pyhd3eb1b0_0  
xz                        5.2.5                h7f8727e_1  
zlib                      1.2.12               h5eee18b_3  
zstd                      1.5.2                ha4553b6_0 

I can see pytorch version relates to cuda 11.3 but we have cuda 11.4. Is this a problem?

This is our one line command we use to install topaz:

. /home/buildbot/anaconda3/etc/profile.d/conda.sh&&conda create -y -n topaz-0.2.5 python=3.6 &&conda activate topaz-0.2.5 &&conda install -y topaz=0.2.5 cudatoolkit -c tbepler -c pytorch

Should we be more specific in the versions of cudatoolkit or pytorch?

pconesa avatar Sep 09 '22 15:09 pconesa

Got more info, it seems tha although Nvidia smi shows cues 11.4... there is no cuda 11.4 installed,or at least in the regular /usr/local/cuda***

pconesa avatar Sep 10 '22 15:09 pconesa

@pconesa Did you figure out a solution to this? It sounds like an issue with your CUDA and/or pytorch installation rather than topaz itself.

tbepler avatar Oct 21 '22 04:10 tbepler

I actually do not know what we have done, but is fixed now. Thanks!

pconesa avatar Oct 21 '22 09:10 pconesa