Picologging is deadlocking when multiple threads are logging
We have an issue with picologging in our application that uses multiple threads. After some time it deadlocks. I managed to reproduce this behavior with a very simple script:
import threading
import picologging as logging
if __name__ == '__main__':
logging.basicConfig(level=logging.INFO, format='[%(name)s] [%(thread)d]: %(message)s')
def log_in_a_loop(name: str):
logger = logging.getLogger(name)
while True:
logger.info('log message')
threading.Thread(name="worker-1", target=log_in_a_loop, args=["worker-1"]).start()
log_in_a_loop("main-thread")
When I replace picologging with the standard logging package the code does not deadlock
This script deadlocks almost instantaneously after being run, producing only a handful of log messages.
The analysis with pystack (attached at the bottom) shows that one thread is holding a GIL and trying to acquire a lock in Handler_handle while the other thread that does not have the GIL is blocked in StreamHandler_emit on pthread_cond_timedwait.
I looked at the picologging source code and I don't see any call to a timedwait on condition.
Traceback for thread 37490 (python) [] (most recent call last):
(C) File "../sysdeps/unix/sysv/linux/x86_64/clone.S", line 95, in __clone (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
(C) File "/build/glibc-wuryBv/glibc-2.31/nptl/pthread_create.c", line 477, in start_thread (/usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
(Python) File "/usr/lib/python3.10/threading.py", line 973, in _bootstrap
self._bootstrap_inner()
Arguments:
self: <Thread at 0x7fa5dc6b3fd0>
(Python) File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
Arguments:
self: <Thread at 0x7fa5dc6b3fd0>
(Python) File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
Arguments:
self: <Thread at 0x7fa5dc6b3fd0>
(Python) File "/home/piotrjanisz/work/logging_test/main.py", line 12, in log_in_a_loop
logger.info('log message')
Arguments:
name: "worker-1"
Locals:
logger: <picologging.Logger at 0x7fa5dbb08030>
(C) File "???", line 0, in Logger_logAndHandle(LoggerT*, _object*, _object*, unsigned short) (/home/piotrjanisz/.virtualenvs/logging_test/lib/python3.10/site-packages/picologging/_picologging.cpython-310-x86_64-linux-gnu.so)
(C) File "???", line 0, in Handler_handle(Handler*, _object*) (/home/piotrjanisz/.virtualenvs/logging_test/lib/python3.10/site-packages/picologging/_picologging.cpython-310-x86_64-linux-gnu.so)
(C) File "???", line 0, in StreamHandler_emit(StreamHandler*, _object* const*, long) (/home/piotrjanisz/.virtualenvs/logging_test/lib/python3.10/site-packages/picologging/_picologging.cpython-310-x86_64-linux-gnu.so)
(C) File "/build/glibc-wuryBv/glibc-2.31/nptl/pthread_cond_wait.c", line 665, in pthread_cond_timedwait@@GLIBC_2.3.2 (/usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
(C) File "/build/glibc-wuryBv/glibc-2.31/nptl/pthread_cond_wait.c", line 520, in __pthread_cond_wait_common (inlined) (/usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
(C) File "../sysdeps/nptl/futex-internal.h", line 320, in futex_abstimed_wait_cancelable (inlined) (/usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
Traceback for thread 37489 (python) [Has the GIL] (most recent call last):
(C) File "???", line 0, in _start (/usr/bin/python3.10)
(C) File "../csu/libc-start.c", line 308, in __libc_start_main (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
(C) File "???", line 0, in Py_BytesMain (/usr/bin/python3.10)
(C) File "???", line 0, in Py_RunMain (/usr/bin/python3.10)
(Python) File "/home/piotrjanisz/work/logging_test/main.py", line 17, in <module>
log_in_a_loop("main-thread")
(Python) File "/home/piotrjanisz/work/logging_test/main.py", line 12, in log_in_a_loop
logger.info('log message')
Arguments:
name: "main-thread"
Locals:
logger: <picologging.Logger at 0x7fa5dbb080f0>
(C) File "???", line 0, in Logger_logAndHandle(LoggerT*, _object*, _object*, unsigned short) (/home/piotrjanisz/.virtualenvs/logging_test/lib/python3.10/site-packages/picologging/_picologging.cpython-310-x86_64-linux-gnu.so)
(C) File "???", line 0, in Handler_handle(Handler*, _object*) (/home/piotrjanisz/.virtualenvs/logging_test/lib/python3.10/site-packages/picologging/_picologging.cpython-310-x86_64-linux-gnu.so)
(C) File "../nptl/pthread_mutex_lock.c", line 115, in __pthread_mutex_lock (/usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
(C) File "/build/glibc-wuryBv/glibc-2.31/nptl/lowlevellock.c", line 52, in __lll_lock_wait (/usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
What might be happening is:
- Despite having separate loggers, both thread will have the same handler (and the same C++ lock inside the handler), since it will end up being the handler of the root logger.
- The first thread (thread 37490) has GIL and, calls into
Handler_handle, which additionally takes the C++ lock. - With the C++ lock held,
Handler_handlecalls intoStreamHandler_emitwhich may execute Python code (PyObject_CallMethod_ONEARG) and AFAIK Python code releases GIL from time to time to let other thread proceed. Maybe otherPy*function also can release GIL - I don't know. Anyway, GIL got released. - With the GIL released, the second thread (thread 37489) can proceed, enters
Handler_handleand blocks on waiting for the C++ lock. - The first thread wakes, want the GIL back in some function inlined into
StreamHandler_emit, which callspthread_cond_timedwait. The weak point of this theory is that although there is a functiontake_gilthat inlines some functions usingpthread_cond_timedwait, the former is not inlineable. But maybe there is some other function that is or if was different in a former version of Python? - We get a classic deadlock - first thread is waiting for GIL locked by the second thread, while the second thread is waiting on the C++ lock locked by first one.
Thus, due to the last point I'm not 100% sure it that's the deadlock in question, but I think it's a possible deadlock.
Same issues here