Strange issues importing Numpy/OpenBLAS related to ulimit
I am attempting to use a simple python script:
#!/usr/bin/env python3
import numpy
This fails due to:
OpenBLAS blas_thread_init: pthread_create failed for thread 21 of 128: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1029364 current, 1029364 max
OpenBLAS blas_thread_init: pthread_create failed for thread 22 of 128: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1029364 current, 1029364 max
OpenBLAS blas_thread_init: pthread_create failed for thread 23 of 128: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1029364 current, 1029364 max
OpenBLAS blas_thread_init: pthread_create failed for thread 24 of 128: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1029364 current, 1029364 max
OpenBLAS blas_thread_init: pthread_create failed for thread 25 of 128: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1029364 current, 1029364 max
...
(full log attached, Python 3.9.18, numpy 1.26.4, libopenblas 0.3.24)
Here are my initial ulimits:
-t: cpu time (seconds) unlimited
-f: file size (blocks) unlimited
-d: data seg size (kbytes) unlimited
-s: stack size (kbytes) 300000
-c: core file size (blocks) unlimited
-m: resident set size (kbytes) 8388608
-u: processes 1029364
-n: file descriptors 16384
-l: locked-in-memory size (kbytes) unlimited
-v: address space (kbytes) 8388608
-x: file locks unlimited
-i: pending signals 1029364
-q: bytes in POSIX msg queues 819200
-e: max nice 0
-r: max rt priority 0
-N 15: unlimited
The error appears to be due to thread allocation, however:
- OpenBLAS fails after allocating only 20 threads on a fresh user login (with ~8 threads running initially). This should come nowhere near 1029364 total.
- Setting
ulimit -v unlimited, or at least around67108684, fixes this issue.
This appears to be related to an issue reported in 2022 for numpy. It appears to be dead and the final comment suggested bringing it up here. Any ideas what might be happening?
Yes, looks like you are running out of address space for the memory buffer that is used to communicate partial results between threads. The output of RLIMIT_NPROC was added only because this seemed to be the limit one is most likely to hit, I don't recall address space being a problem before.
300MB stack is excessive...
Unusual, but may have been set during testing. I'm more intrigued by the low limit on address space (or virtual memory) that is causing the problem here - I'm more used to seeing this default to "unlimited" on any reasonably modern hardware ?
For context, this ulimit -a is from an HPC system head node. The limits were imposed by the system administrators to restrict fair play usage. Perhaps unsurprisingly, OpenBLAS is not the only library or program that the ulimit -v causes to crash.
I've been working with them to find a solution (some of the head nodes have a hard limit of 8-16 GB), but in the meantime I was curious why OpenBLAS was reporting an issue with ulimit -u when ulimit -v seemed to be the root of the issue. Would it be possible to modify OpenBLAS to report the correct problem, and/or suggest possible solutions (e.g., reducing OPENBLAS_NUM_THREADS)? This could be helpful any future users that run into this issue.
Thanks for your help so far!
This is simply because issues with ulimit -u are the only ones documented on the fork(2) manpage to raise EAGAIN, and the only cause of fork-related early aborts encountered so far.
Hello, I am getting the same issue. What was the solution for this?
setting the virtual address space limit (ulimit -v) to a larger size or "unlimited" , so that each thread can allocate its memory buffer as needed