AKS-Edge icon indicating copy to clipboard operation
AKS-Edge copied to clipboard

[BUG] GPU-PV broken for Windows 11 24H2

Open dj-vandijk opened this issue 10 months ago • 2 comments

Describe the bug We're using the GPU-PV functionality of AKS-Edge and this has been working great for us so far. However after updating one of our machines to Windows 11 24H2 (coming from 23H2) the nvidia-device-plugin no longer seems to be working. It fails to start with the error:

Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: nvml error: unable to load the nvml library: unknown

To Reproduce Steps to reproduce the behavior:

  1. Install Windows 11 24H2
  2. Follow the GPU acceleration guide from: https://learn.microsoft.com/en-us/azure/aks/aksarc/aks-edge-gpu
  3. observe the error "unable to load the nvml library"

Environment (please complete the following information):

  • AKS Edge Essentials Version: 1.9.262.0
  • Kubernetes version 1.29.6
  • Windows Host OS
    • Edition: Professional
    • Version: 24H2 build 26100.3476
  • NVIDIA RTX A5000
  • NVIDIA driver 572.83

dj-vandijk avatar Apr 08 '25 13:04 dj-vandijk

@dj-vandijk Thank you for creating this issue. This is a known issue that we plan to fix in the next release (1.11 release). Thanks!

SummerSmith avatar Apr 29 '25 21:04 SummerSmith

@SummerSmith is there any ETA for the 1.11 release?

dj-vandijk avatar Jun 10 '25 11:06 dj-vandijk