CudaWarning: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
Hi, in some machines our installation (scipion-topaz) work fine but in our test server topaz is not finding the GPUs.
Topaz stderr output is:
CudaWarning: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
Falling back to CPU.
# using device=0 with cuda=False
# Loading model: unet
# 1 of 10 completed.
# 2 of 10 completed.
# 3 of 10 completed.
# 4 of 10 completed.
# 5 of 10 completed.
# 6 of 10 completed.
# 7 of 10 completed.
# 8 of 10 completed.
# 9 of 10 completed.
# 10 of 10 completed.
CudaWarning: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
Falling back to CPU.
CudaWarning: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
Falling back to CPU.
nvisia-smi is:
buildbot@scipionbox:~$ nvidia-smi
Fri Sep 9 16:47:05 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:03:00.0 Off | N/A |
| 27% 38C P8 7W / 151W | 6MiB / 8119MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:81:00.0 Off | N/A |
| 34% 35C P8 6W / 151W | 6MiB / 8119MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Quadro M4000 Off | 00000000:82:00.0 Off | N/A |
| 46% 37C P8 11W / 120W | 24MiB / 8125MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3783 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 3783 G /usr/lib/xorg/Xorg 4MiB |
| 2 N/A N/A 3783 G /usr/lib/xorg/Xorg 21MiB |
+-----------------------------------------------------------------------------+
environment info is
(topaz-0.2.5) buildbot@scipionbox:~$ conda list
# packages in environment at /home/buildbot/anaconda3/envs/topaz-0.2.5:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
blas 1.0 mkl
bzip2 1.0.8 h7b6447c_0
ca-certificates 2022.07.19 h06a4308_0
certifi 2021.5.30 py36h06a4308_0
cudatoolkit 11.3.1 h2bc3f7f_2
dataclasses 0.8 pyh4f3eec9_6
ffmpeg 4.3 hf484d3e_0 pytorch
freetype 2.11.0 h70c0345_0
future 0.18.2 py36_1
gmp 6.2.1 h295c915_3
gnutls 3.6.15 he1e5248_0
intel-openmp 2022.1.0 h9e868ea_3769
joblib 1.0.1 pyhd3eb1b0_0
jpeg 9e h7f8727e_0
lame 3.100 h7b6447c_0
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.38 h1181459_1
lerc 3.0 h295c915_0
libdeflate 1.8 h7f8727e_5
libffi 3.3 he6710b0_2
libgcc-ng 11.2.0 h1234567_1
libgfortran-ng 7.5.0 ha8ba4b0_17
libgfortran4 7.5.0 ha8ba4b0_17
libgomp 11.2.0 h1234567_1
libiconv 1.16 h7f8727e_2
libidn2 2.3.2 h7f8727e_0
libpng 1.6.37 hbc83047_0
libstdcxx-ng 11.2.0 h1234567_1
libtasn1 4.16.0 h27cfd23_0
libtiff 4.4.0 hecacb30_0
libunistring 0.9.10 h27cfd23_0
libuv 1.40.0 h7b6447c_0
libwebp-base 1.2.2 h7f8727e_0
lz4-c 1.9.3 h295c915_1
mkl 2020.2 256
mkl-service 2.3.0 py36he8ac12f_0
mkl_fft 1.3.0 py36h54f3939_0
mkl_random 1.1.1 py36h0573a6f_0
ncurses 6.3 h5eee18b_3
nettle 3.7.3 hbbd107a_1
numpy 1.19.2 py36h54aff64_0
numpy-base 1.19.2 py36hfa32c7d_0
olefile 0.46 py36_0
openh264 2.1.1 h4ff587b_0
openjpeg 2.4.0 h3ad879b_0
openssl 1.1.1q h7f8727e_0
pandas 1.1.5 py36ha9443f7_0
pillow 8.3.1 py36h2c7a002_0
pip 21.2.2 py36h06a4308_0
python 3.6.13 h12debd9_1
python-dateutil 2.8.2 pyhd3eb1b0_0
pytorch 1.10.2 py3.6_cuda11.3_cudnn8.2.0_0 pytorch
pytorch-mutex 1.0 cuda pytorch
pytz 2021.3 pyhd3eb1b0_0
readline 8.1.2 h7f8727e_1
scikit-learn 0.24.2 py36ha9443f7_0
scipy 1.5.2 py36h0b6359f_0
setuptools 58.0.4 py36h06a4308_0
six 1.16.0 pyhd3eb1b0_1
sqlite 3.39.2 h5082296_0
threadpoolctl 2.2.0 pyh0d69192_0
tk 8.6.12 h1ccaba5_0
topaz 0.2.5 py_0 tbepler
torchvision 0.11.3 py36_cu113 pytorch
typing_extensions 4.1.1 pyh06a4308_0
wheel 0.37.1 pyhd3eb1b0_0
xz 5.2.5 h7f8727e_1
zlib 1.2.12 h5eee18b_3
zstd 1.5.2 ha4553b6_0
I can see pytorch version relates to cuda 11.3 but we have cuda 11.4. Is this a problem?
This is our one line command we use to install topaz:
. /home/buildbot/anaconda3/etc/profile.d/conda.sh&&conda create -y -n topaz-0.2.5 python=3.6 &&conda activate topaz-0.2.5 &&conda install -y topaz=0.2.5 cudatoolkit -c tbepler -c pytorch
Should we be more specific in the versions of cudatoolkit or pytorch?
Got more info, it seems tha although Nvidia smi shows cues 11.4... there is no cuda 11.4 installed,or at least in the regular /usr/local/cuda***
@pconesa Did you figure out a solution to this? It sounds like an issue with your CUDA and/or pytorch installation rather than topaz itself.
I actually do not know what we have done, but is fixed now. Thanks!