[bug]:
Is there an existing issue for this problem?
- [X] I have searched the existing issues
Operating system
Linux
GPU vendor
AMD (ROCm)
GPU model
RTX 6900 XT
GPU VRAM
16 GB
Version number
5.0
Browser
firefox
Python dependencies
3.10
What happened
When i try and Invoke i get the following error: RuntimeError: HIP error: invalid device function HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing AMD_SERIALIZE_KERNEL=3 Compile with TORCH_USE_HIP_DSA to enable device-side assertions.
Comfy is able to work fine on current rocm setup.
What you expected to happen
generate an image
How to reproduce the problem
No response
Additional context
on the initial install it kept installing CPU pytorch so i uninstalled that and installed the ROCM compatibly pytorch on launch it does pick up my AMD radeon but it shoots an error saying bitsandbytes setup failed despite there being a CUDA compatible card. when i pythom -m bitsandbytes i get AttributeError: 'NoneType' object has no attribute 'split'. also it seems like im getting a patchmatch compile error. here is the log below:
Could not load bitsandbytes native library: 'NoneType' object has no attribute 'split'
Traceback (most recent call last):
File "/home/sagar/miniconda3/envs/invoke/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 109, in
CUDA Setup failed despite CUDA being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
patchmatch.patch_match: INFO - Compiling and loading c extensions from "/home/sagar/miniconda3/envs/invoke/lib/python3.10/site-packages/patchmatch". patchmatch.patch_match: ERROR - patchmatch failed to load or compile (Command 'make clean && make' returned non-zero exit status 2.). patchmatch.patch_match: INFO - Refer to https://invoke-ai.github.io/InvokeAI/installation/060_INSTALL_PATCHMATCH/ for installation instructions. [2024-09-28 00:10:44,825]::[InvokeAI]::INFO --> Patchmatch not loaded (nonfatal) [2024-09-28 00:10:45,939]::[InvokeAI]::INFO --> Using torch device: AMD Radeon Graphics [2024-09-28 00:10:46,169]::[InvokeAI]::INFO --> cuDNN version: 3001000 [2024-09-28 00:10:46,183]::[uvicorn.error]::INFO --> Started server process [38431]
Discord username
lordoflaziness
I was able to solve the bitsandbytes issue as well as the patchmatch issue. I also set the device id to my card so on boot it says torch device: AMD 6900XT. but every time i try to invoke i still get the same error message:
RuntimeError: HIP error: invalid device function HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing AMD_SERIALIZE_KERNEL=3 Compile with TORCH_USE_HIP_DSA to enable device-side assertions.
I am also having this issue with my AMD 7800XT. I was able to fix the bitsandbytes issue by following their install guide.
cd [InvokeAI dir]
source .venv/bin/activate
pip install --no-deps 'https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_multi-backend-refactor/bitsandbytes-0.44.1.dev0-py3-none-manylinux_2_24_x86_64.whl' # the --force-reinstall broke a bunch of the other InvokeAI deps
The runtime error above happens both on a fresh install of InvokeAI and with the bitsandbytes update
After some reading I disabled the onboard graphics unit of the AMD CPU.
But this did not resolve the issue - I also get:
HIP error: invalid device function HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
This is really confusing because I had it running with the 3.x version of invokeai last year on the same machine with the same approach to the ROCM installation (using the AMD original repos!)
All suggested ROCM tests by AMD suceed.
How can we resolve this?
thx for reading.
Some issue Please help\fix!
I had the same issue in docker, and found the following:
I searched the automatic1111 issues for the issue and found https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/14763 adding HSA_OVERRIDE_GFX_VERSION=11.0.0 to the .env solves the issue, this should be documented though.
Same problem here with Invoke 5.7.1 on AMD Ryzen 9 8945HS w/ Radeon 780M Graphics, Linux 6.11 on Ubuntu 24.04.
Getting this on a nvidia setup with docker. This system has working comfy and other ai containers, so not sure what is wrong. When the Invoke container starts:
invokeai-cuda-1 | [2025-04-21 11:08:27,756]::[InvokeAI]::INFO --> Using torch device: CPU
invokeai-cuda-1 | Could not load bitsandbytes native library: /opt/venv/lib/python3.12/site-packages/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory
invokeai-cuda-1 | Traceback (most recent call last):
invokeai-cuda-1 | File "/opt/venv/lib/python3.12/site-packages/bitsandbytes/cextension.py", line 85, in <module>
invokeai-cuda-1 | lib = get_native_library()
invokeai-cuda-1 | ^^^^^^^^^^^^^^^^^^^^
invokeai-cuda-1 | File "/opt/venv/lib/python3.12/site-packages/bitsandbytes/cextension.py", line 72, in get_native_library
invokeai-cuda-1 | dll = ct.cdll.LoadLibrary(str(binary_path))
invokeai-cuda-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
invokeai-cuda-1 | File "/root/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/ctypes/__init__.py", line 460, in LoadLibrary
invokeai-cuda-1 | return self._dlltype(name)
invokeai-cuda-1 | ^^^^^^^^^^^^^^^^^^^
invokeai-cuda-1 | File "/root/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/ctypes/__init__.py", line 379, in __init__
invokeai-cuda-1 | self._handle = _dlopen(self._name, mode)
invokeai-cuda-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^
invokeai-cuda-1 | OSError: /opt/venv/lib/python3.12/site-packages/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory
invokeai-cuda-1 | /opt/venv/lib/python3.12/site-packages/torch/cuda/__init__.py:734: UserWarning: Can't initialize NVML
invokeai-cuda-1 | warnings.warn("Can't initialize NVML")
invokeai-cuda-1 | /opt/venv/lib/python3.12/site-packages/torch/cuda/__init__.py:734: UserWarning: Can't initialize NVML
invokeai-cuda-1 | warnings.warn("Can't initialize NVML")
invokeai-cuda-1 | [2025-04-21 11:08:29,071]::[InvokeAI]::INFO --> cuDNN version: 90100
invokeai-cuda-1 | [2025-04-21 11:08:29,896]::[InvokeAI]::INFO --> Patchmatch initialized
invokeai-cuda-1 | [2025-04-21 11:08:30,562]::[InvokeAI]::INFO --> InvokeAI version 5.10.1
And when generating with Flux:
RuntimeError: All input tensors need to be on the same GPU, but found some tensors to not be on a GPU: [(torch.Size([256, 4096]), device(type='cpu'))]
It does generate with SDXL fallsback to CPU.
I have Cuda 12.8 installed on the system with driver 570.133.07.