easydiffusion icon indicating copy to clipboard operation
easydiffusion copied to clipboard

Does not Recognize External Thunderbolt Radeon GPU, Neural Compute Stick, or internal Intel OneAPI GPU

Open iamhumanipromise opened this issue 2 years ago • 3 comments

Not sure where to start and this may land nowhere: but in the process of discovery here I am. This post gets rather long so GPT4 summarizes it as: Summary: The issue aims to enhance Easy Diffusion's functionality by enabling it to recognize and utilize all available devices and cores in a system, including CPUs, NVIDIA and AMD GPUs, Compute Sticks, and Intel OneAPI GPUs. The current behavior only detects CPU cores or NVIDIA GPUs. The author seeks a solution that can manage multiple backends and virtual environments, such as Anaconda or Docker, to improve performance and avoid "out of memory" situations. Although their Python skills are limited, the author is willing to contribute by testing potential solutions.

Hello! I'm here to explore the possibility of enhancing Easy Diffusion's functionality to better utilize all available devices and cores in a system.

Expected behavior: Easy Diffusion should recognize all devices and use them collectively to avoid "out of memory" situations, and to fully utilize all available cores during a session, rather than just one or the other.

Current behavior: Easy Diffusion only recognizes CPU cores or the NVIDIA GPU. AMD GPU, Compute Stick, and Intel OneAPI GPU are not detected.

Hardware Setup:

Core i7-9750H 6 Core / 12 Thread CPU (Comet Lake 14nm AVX2 CPU) 32GB DDR4-2666MHz Integrated Intel UHD 630 CFL GT2 GPU with 24EU (sharing 32GB system RAM w/Core i7) (GFX9/GFX9.5) Intel Neural Compute Stick 2 with 12 "shave cores" and 4GB VPURAM (Movidius Myriad cores, soon to be integrated into 14th Gen CPUs) NVIDIA RTX 2080 Max-Q w/8GB GDDR5 VRAM (internal) AMD Vega Frontier Edition 16GB GDDR5 VRAM via Razer Thunderbolt enclosure

Operating Environment:

I am using EndeavourOS rolling release (downstream of Arch Linux, but Arch is being installed as we speak for testing). The system has all necessary toolkits and SDKs packages-list-foreign.txt packages-list-native.txt installed for each GPU, including:

  • CUDA
  • TensorRT
  • OptiX
  • HIP-Runtime-AMD and all associated HIP packages (for ROCm)
  • All Intel Level Zero and OneAPI packages, compilers, runtimes, and headers
  • OpenVINO and drivers for the neural compute stick
  • OpenCV
  • OpenVDK
  • OSPRay
  • OpenVKL
  • OpenIMPI
  • OpenMPI with HIP backend

Upcoming or Existing Practical Examples of Diffusion for On-Device Distributed Workloads / Personal HSA Systems:

  • OpenVINO + OneAPI: Upcoming 14th Generation Intel CPUs with integrated Movidius and Xe (Arc-based) graphics cores, along with additional add-on Intel Arc GPUs
  • ROCm/HIP/OpenSYCL + CUDA: Ryzen 7000-series CPU paired with NVIDIA GPU and/or Radeon GPUs
  • OneAPI+OpenVINO+HIP/ROCm: 14th Generation Intel CPU (or 13th generation + compute stick) paired with AMD graphics solutions

Summary:

I previously posted this issue on the Arch4Edu group, which focuses on creating custom packages. The response suggested using Anaconda or Docker for each virtual environment (CPU, neural compute, CUDA, ROCm, OneAPI, etc.) and an interface to link them together. Unfortunately, my Python skills are limited, and I cannot develop a unified, multi-platform, multi-SDK, Easy Diffusion backend. However, I can contribute by testing!

Link to Arch4Edu Github

Also of Note:

If this mysterious "neural fabric" is created to make Easy Diffusion work on multiple platforms on one PC via virtual environments connected via this "interlink" that has yet to exist: it could also apply to a multi-device on-network or WAN distributed Easy Diffusion session.

iamhumanipromise avatar Apr 14 '23 00:04 iamhumanipromise

PyTorch itself, running all this ML code, needs to be compiled for specific devices, and PyTorch itself only supports a couple: https://pytorch.org/get-started/locally/

For example, for a radeon card, it needs ROCm compatible PyTorch compiled for it. Beyond that, Diffusers, itself, may also not support multiple special case devices, as they're a wrapper for PyTorch operations.

I haven't done much with Diffusers in months, I mainly use WebUI, or ComfyUI. I have a node package for ComfyUI that has a lot of features I put into Easy Diffusion: https://github.com/WASasquatch/was-node-suite-comfyui

WASasquatch avatar Apr 14 '23 02:04 WASasquatch

PyTorch itself, running all this ML code, needs to be compiled for specific devices, and PyTorch itself only supports a couple: https://pytorch.org/get-started/locally/

For example, for a radeon card, it needs ROCm compatible PyTorch compiled for it. Beyond that, Diffusers, itself, may also not support multiple special case devices, as they're a wrapper for PyTorch operations.

I haven't done much with Diffusers in months, I mainly use WebUI, or ComfyUI. I have a node package for ComfyUI that has a lot of features I put into Easy Diffusion: https://github.com/WASasquatch/was-node-suite-comfyui

I am using Arch Linux, which has multiple packages. This means there is a "pytorch-opt-rocm" which has rocm + avx2 cpu optimizations. There is also a "pytorch opt-cuda" which is for cuda +avx2. There is also a pytorch-tensorrt for the tensor cores.

I will see if I can use the default PKGBUILD templates for each as a recipe to compile a system package that has support for all. If not, then I will message on the Arch forums... and from there the pytorch project I suppose to support multiple backends.

That being said, I thought I recently saw a campaign/project that "permits" accessing multiple backends across multiple virtual environments to "slice" the jobs, then distribute across each VE>

iamhumanipromise avatar Apr 20 '23 22:04 iamhumanipromise

PyTorch itself, running all this ML code, needs to be compiled for specific devices, and PyTorch itself only supports a couple: https://pytorch.org/get-started/locally/ For example, for a radeon card, it needs ROCm compatible PyTorch compiled for it. Beyond that, Diffusers, itself, may also not support multiple special case devices, as they're a wrapper for PyTorch operations. I haven't done much with Diffusers in months, I mainly use WebUI, or ComfyUI. I have a node package for ComfyUI that has a lot of features I put into Easy Diffusion: https://github.com/WASasquatch/was-node-suite-comfyui

I am using Arch Linux, which has multiple packages. This means there is a "pytorch-opt-rocm" which has rocm + avx2 cpu optimizations. There is also a "pytorch opt-cuda" which is for cuda +avx2. There is also a pytorch-tensorrt for the tensor cores.

I will see if I can use the default PKGBUILD templates for each as a recipe to compile a system package that has support for all. If not, then I will message on the Arch forums... and from there the pytorch project I suppose to support multiple backends.

That being said, I thought I recently saw a campaign/project that "permits" accessing multiple backends across multiple virtual environments to "slice" the jobs, then distribute across each VE>

I haven't seen anything. But inherently from the get go these sort of projects assume you are starting a fresh venv or comes with nothing on it, to then install the correct compatibile packages.

WASasquatch avatar Apr 21 '23 00:04 WASasquatch