MachineLearningNotebooks
MachineLearningNotebooks copied to clipboard
Default Tensorflow binary does not work; outdated/incompatible cuDNN version
The azureml_py38_PT_TF conda environment is broken and doesn't work as the current (as of 17/05/2023) as the default cuDNN binary is version 6.1, which is not compatible with any version of tensorflow.
Current work around is to manually update the cuDNN binaries as below, using the link from Nvidia
export URL="CUDNN-8.6.0-LINK-FROM-NVIDIA-WEBSITE"
# ==== DOWNLOAD CUDDN ====
curl $URL -o ./cudnn-linux-x86_64-8.6.0.163_cuda11-archive.tar.xz
sudo tar -xvf ./cudnn-linux-x86_64-8.6.0.163_cuda11-archive.tar.xz
# ==== INSTALL CUDDN ====
sudo cp ./cudnn-*-archive/include/cudnn*.h /usr/local/cuda/include
sudo cp -P ./cudnn-*-archive/lib/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
# ==== LINK UPDATED BINARIES ====
sudo ldconfig
# ==== INSTALL CONDA ENV ====
conda create -n "tfgpu" python=3.10 -y
conda activate tfgpu
conda install -c conda-forge cudatoolkit=11.8.0 ipykernel -y
python3 -m pip install nvidia-cudnn-cu11==8.6.0.163 tensorflow==2.12.*
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
python3 -m ipykernel install --user --name tfgpu --display-name "Python (tf-cudnn8.6)"
# ==== VERIFY ====
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
Is it possible to bump the included cuDNN version to get around this problem?