CUDA allocation on memory-only NUMA nodes fails
NVIDIA Open GPU Kernel Modules Version
570.124.06
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
- [ ] I confirm that this does not happen with the proprietary driver package.
Operating System and Version
Ubuntu 22.04.4 LTS
Kernel Release
5.15.0-113-generic
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
- [x] I am running on a stable kernel release.
Hardware: GPU
NVIDIA A100-PCIE-40GB
Describe the bug
I am unable to allocate memory using CUDA on memory-only NUMA nodes. I wrote a simple program (see next section) that first tries to allocate pinned host memory using cudaHostAlloc. If that fails, it tries to allocate regular host memory using cudaMalloc. The output of numactl -H on my machine looks like this:
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100 102 104 106 108 110
node 0 size: 128399 MB
node 0 free: 116688 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101 103 105 107 109 111
node 1 size: 128955 MB
node 1 free: 119579 MB
node 2 cpus:
node 2 size: 507904 MB
node 2 free: 507897 MB
node 3 cpus:
node 3 size: 507904 MB
node 3 free: 507904 MB
node distances:
node 0 1 2 3
0: 10 20 17 28
1: 20 10 28 17
2: 17 28 10 28
3: 28 17 28 10
Note how nodes 2 and 3 are memory-only. When I run cuda_numa, it allocates pinned memory successfully and exits. It also runs successfully if I bind memory allocation to either nodes 0 or 1, by running numactl -m 0,1 cuda_numa. But, if I bind memory allocation to nodes 2 or 3 by running numactl -m 2,3 cuda_numa, CUDA fails allocation both pinned and regular memory allocation with the following error:
Error allocating pinned host memory: CUDA-capable device(s) is/are busy or unavailable
Error allocating regular host memory: CUDA-capable device(s) is/are busy or unavailable
This used to work fine with the closed source driver revision 550, which I recently upgraded from. I haven't been able to experiment with the closed source version of revision 570 yet.
To Reproduce
You need a machine with at least one NUMA node that is memory-only (i.e., has no CPUs).
- Compile cuda_numa.cu using nvcc (I used V12.8.93):
nvcc -o cuda_numa cuda_numa.cu - Run the binary and bind its allocation to the memory-only NUMA node X:
numactl -m X ./cuda_numa
Bug Incidence
Always
nvidia-bug-report.log.gz
More Info
No response
Update: the issue seems to persist with the proprietary driver (570.124.06) too.
+1 @Sacusa Did you get any solution/workaround?