open-gpu-kernel-modules icon indicating copy to clipboard operation
open-gpu-kernel-modules copied to clipboard

CUDA allocation on memory-only NUMA nodes fails

Open Sacusa opened this issue 7 months ago • 2 comments

NVIDIA Open GPU Kernel Modules Version

570.124.06

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • [ ] I confirm that this does not happen with the proprietary driver package.

Operating System and Version

Ubuntu 22.04.4 LTS

Kernel Release

5.15.0-113-generic

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • [x] I am running on a stable kernel release.

Hardware: GPU

NVIDIA A100-PCIE-40GB

Describe the bug

I am unable to allocate memory using CUDA on memory-only NUMA nodes. I wrote a simple program (see next section) that first tries to allocate pinned host memory using cudaHostAlloc. If that fails, it tries to allocate regular host memory using cudaMalloc. The output of numactl -H on my machine looks like this:

node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100 102 104 106 108 110
node 0 size: 128399 MB
node 0 free: 116688 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101 103 105 107 109 111
node 1 size: 128955 MB
node 1 free: 119579 MB
node 2 cpus:
node 2 size: 507904 MB
node 2 free: 507897 MB
node 3 cpus:
node 3 size: 507904 MB
node 3 free: 507904 MB
node distances:
node   0   1   2   3
  0:  10  20  17  28
  1:  20  10  28  17
  2:  17  28  10  28
  3:  28  17  28  10

Note how nodes 2 and 3 are memory-only. When I run cuda_numa, it allocates pinned memory successfully and exits. It also runs successfully if I bind memory allocation to either nodes 0 or 1, by running numactl -m 0,1 cuda_numa. But, if I bind memory allocation to nodes 2 or 3 by running numactl -m 2,3 cuda_numa, CUDA fails allocation both pinned and regular memory allocation with the following error:

Error allocating pinned host memory: CUDA-capable device(s) is/are busy or unavailable
Error allocating regular host memory: CUDA-capable device(s) is/are busy or unavailable

This used to work fine with the closed source driver revision 550, which I recently upgraded from. I haven't been able to experiment with the closed source version of revision 570 yet.

To Reproduce

You need a machine with at least one NUMA node that is memory-only (i.e., has no CPUs).

  1. Compile cuda_numa.cu using nvcc (I used V12.8.93): nvcc -o cuda_numa cuda_numa.cu
  2. Run the binary and bind its allocation to the memory-only NUMA node X: numactl -m X ./cuda_numa

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

No response

Sacusa avatar Jun 21 '25 16:06 Sacusa

Update: the issue seems to persist with the proprietary driver (570.124.06) too.

Sacusa avatar Jun 23 '25 14:06 Sacusa

+1 @Sacusa Did you get any solution/workaround?

Basavaraja-MS avatar Sep 22 '25 14:09 Basavaraja-MS