cannot import name '_compare_version' from 'torchmetrics.utilities.imports
Describe the bug
Attempting to run the docker image results in:
ImportError: cannot import name '_compare_version' from 'torchmetrics.utilities.imports' (/usr/local/lib/python3.10/dist-packages/torchmetrics/utilities/imports.py)
To Reproduce Steps to reproduce the behavior:
-
docker build . -t stable-diffusion-rocm -
docker run -it -p 7860:7860 --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /home/karl/stable-diffusion:/pwd -e HSA_OVERRIDE_GFX_VERSION=10.3.0 --name stable-diffusion stable-diffusion-rocm
Expected behavior
It should complete initialization.
** Container Output **
Building wheels for collected packages: lit
Building wheel for lit (pyproject.toml) ... done
Created wheel for lit: filename=lit-16.0.6-py3-none-any.whl size=93605 sha256=860b3739ae0c10b7d5237825a8963745aa4af6aa1a2d2bf6019323fb3356120b
Stored in directory: /root/.cache/pip/wheels/14/f9/07/bb2308587bc2f57158f905a2325f6a89a2befa7437b2d7e137
Successfully built lit
Installing collected packages: mpmath, lit, cmake, urllib3, typing-extensions, sympy, pillow, numpy, networkx, MarkupSafe, idna, filelock, charset-normalizer, certifi, requests, jinja2, pytorch-triton-rocm, torch, torchvision
Successfully installed MarkupSafe-2.1.3 certifi-2023.7.22 charset-normalizer-3.2.0 cmake-3.27.2 filelock-3.12.2 idna-3.4 jinja2-3.1.2 lit-16.0.6 mpmath-1.3.0 networkx-3.1 numpy-1.25.2 pillow-10.0.0 pytorch-triton-rocm-2.0.2 requests-2.31.0 sympy-1.12 torch-2.0.1+rocm5.4.2 torchvision-0.15.2+rocm5.4.2 typing-extensions-4.7.1 urllib3-2.0.4
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Installing gfpgan
Installing clip
Installing open_clip
Cloning Stable Diffusion into /sd/repositories/stable-diffusion-stability-ai...
Cloning Taming Transformers into /sd/repositories/taming-transformers...
Cloning K-diffusion into /sd/repositories/k-diffusion...
Cloning CodeFormer into /sd/repositories/CodeFormer...
Cloning BLIP into /sd/repositories/BLIP...
Installing requirements for CodeFormer
Installing requirements
Launching Web UI with arguments: --port 7860
Traceback (most recent call last):
File "/sd/launch.py", line 370, in <module>
start()
File "/sd/launch.py", line 361, in start
import webui
File "/sd/webui.py", line 24, in <module>
import pytorch_lightning # pytorch_lightning should be imported after torch, but it re-enables warnings on import so import once to disable them
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/__init__.py", line 34, in <module>
from pytorch_lightning.callbacks import Callback # noqa: E402
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/callbacks/__init__.py", line 25, in <module>
from pytorch_lightning.callbacks.progress import ProgressBarBase, RichProgressBar, TQDMProgressBar
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/callbacks/progress/__init__.py", line 22, in <module>
from pytorch_lightning.callbacks.progress.rich_progress import RichProgressBar # noqa: F401
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/callbacks/progress/rich_progress.py", line 20, in <module>
from torchmetrics.utilities.imports import _compare_version
ImportError: cannot import name '_compare_version' from 'torchmetrics.utilities.imports' (/usr/local/lib/python3.10/dist-packages/torchmetrics/utilities/imports.py)
Desktop (please complete the following information):
- System RAM & SWAP: 32gb + 0
- AMD GPU & VRAM: 6600XT 8GB
- OS + Distro and Version: Ubuntu 22.04
- Host ROCm Version: 5.6.0
So after some digging, it's caused by some inter-package breakage that occurred within the past two weeks. I got it to work by manually fixing up some package versions.
# Preinstall dependencies. This will fail
RUN python -d launch.py --exit --skip-torch-cuda-test || true
# Make fixes
RUN --mount=type=cache,target=/root/.cache/pip \
pip3 install torchmetrics==0.11.4 && \
pip3 install gradio>=3.36.1 && \
pip3 install fastapi==0.95.2 && \
true
# Preinstall dependencies again
RUN python -d launch.py --exit --skip-torch-cuda-test
Note: You can also move the model downloading part to your Dockerfile:
# Pre-download model
RUN wget -q https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors -O /sd/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors
That gives a docker instance with a first-time launch of less than 30 seconds.
Everything working a-ok now :) Thanks for putting this out there!
BTW, the main stable-diffusion-webui branch now works with ROCM if you just make a one-line change: https://github.com/kstenerud/stable-diffusion-webui/commit/a08711d713bfeb2155f084b7d9f9a28ce6f3ac43
Then you can get a fully-functional SD webui like so:
FROM ubuntu:jammy
SHELL ["/bin/bash", "-c"]
ENV PORT=7860 \
DEBIAN_FRONTEND=noninteractive \
PYTHONUNBUFFERED=1 \
PYTHONIOENCODING=UTF-8 \
REQS_FILE='requirements.txt' \
COMMANDLINE_ARGS=''
WORKDIR /opt
RUN apt-get -y update && \
apt-get install -y --no-install-recommends libstdc++-12-dev ca-certificates wget gnupg2 gawk curl git libglib2.0-0 apt-utils python3.10-venv python3-pip && \
wget https://repo.radeon.com/amdgpu-install/5.5/ubuntu/jammy/amdgpu-install_5.5.50500-1_all.deb && \
apt-get install -y ./amdgpu-install_5.5.50500-1_all.deb && \
amdgpu-install -y --usecase=rocm --no-dkms && \
true
RUN git clone -b rocm https://github.com/kstenerud/stable-diffusion-webui.git /sd
WORKDIR /sd
RUN apt-get autoremove -y && \
apt-get clean -y && \
rm -rf /var/lib/apt/lists/* && \
python3 -m venv venv && \
source venv/bin/activate && \
ln -s /usr/bin/python3 /usr/bin/python && \
python3 -m pip install --upgrade pip wheel
# Pre-download model
RUN wget -q https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors -O /sd/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors
# Preinstall dependencies. This will fail.
RUN python -d launch.py --exit --skip-torch-cuda-test || true
# Apply fixes
# pytorch_lightning: No module named 'pytorch_lightning.utilities.distributed'
# torchmetrics: cannot import name '_compare_version' from 'torchmetrics.utilities.imports
# fastapi: AttributeError: __config__
RUN --mount=type=cache,target=/root/.cache/pip \
pip3 install pytorch_lightning==1.7.5 && \
pip3 install torchmetrics==0.11.4 && \
pip3 install fastapi==0.95.2 && \
true
# SD output image format problem: SyntaxError: not a TIFF file (header b"b'Exif\\x" not valid)
# Just run the previous output image through another image editor and it'll work again.
# Preinstall dependencies again
RUN python -d launch.py --exit --skip-torch-cuda-test
EXPOSE ${PORT}
VOLUME [ "/sd/configs","/sd/models", "/sd/outputs","/sd/extensions", "/sd/plugins"]
ENTRYPOINT python -d launch.py --port "${PORT}" --listen || bash