sentry-python icon indicating copy to clipboard operation
sentry-python copied to clipboard

Segfault in contextvars HAMT during isolation_scope() / new_scope() with uvicorn multiprocessing workers

Open sysradium opened this issue 1 month ago • 2 comments

How do you use Sentry?

Self-hosted/on-premise

Version

2.45.0

Steps to Reproduce

Environment

  • sentry-sdk version: 2.45.0
  • Python version: 3.10.x
  • OS: Linux (x86_64)
  • Framework: FastAPI + uvicorn with --workers 10 (multiprocessing spawn)
  • Event loop: uvloop 0.21.0

Problem

Workers crash with SIGSEGV when Sentry SDK calls _current_scope.set() inside isolation_scope() or new_scope(). The crash occurs in Python's HAMT (Hash Array Mapped Trie) implementation used by contextvars.

GDB analysis shows that the HAMT node pointer has been corrupted - it points to a PyFunctionObject (_safe_repr_wrapper from sentry_sdk/serializer.py) instead of a HAMT node. When Python tries to clone this "node", it reads the function's vectorcall field as if it were b_array, then crashes trying to Py_INCREF a C function pointer.

Stack trace:

#0  _Py_INCREF (op=0x7b4d9a872260 <_PyFunction_Vectorcall>) at ./Include/object.h:472
#1  _Py_XINCREF (op=0x7b4d9a872260 <_PyFunction_Vectorcall>) at ./Include/object.h:558
#2  hamt_node_bitmap_clone (node=0x7b4c301c7e20) at Python/hamt.c:583
#3  hamt_node_bitmap_assoc (...) at Python/hamt.c:771
#4  _PyHamt_Assoc (...) at Python/hamt.c:2308
#5  contextvar_set (var=0x7b4d9916a5c0, val=<Scope at remote 0x7b4c30d62b60>) at Python/context.c:738
#6  PyContextVar_Set (ovar=<_contextvars.ContextVar at remote 0x7b4d9916a5c0>, ...) at Python/context.c:285
...
#12 _PyEval_EvalFrame (...) for file sentry_sdk/scope.py, line 1785, in isolation_scope

Python stack trace

File "sentry_sdk/scope.py", line 1785, in isolation_scope
    current_token = _current_scope.set(forked_current_scope)
File "contextlib.py", line 135, in __enter__
    return next(self.gen)
File "sentry_sdk/integrations/asgi.py", line 198, in _run_app
    with sentry_sdk.isolation_scope() as sentry_scope:

Corrupted HAMT node:

(gdb) p *node
$2 = {
  ob_base = {ob_type = 0x7b4d9aa8a7a0 <PyFunction_Type>},
  b_bitmap = 2587471808,
  b_array = {'_safe_repr_wrapper'}  # Does not look like HAMT data
}

(gdb) p node->b_array[0]
$5 = '_safe_repr_wrapper'

(gdb) x/20x 0x7b4d9a872260
0x7b4d9a872260 <_PyFunction_Vectorcall>: ....

Sentry configuration

sentry_sdk.init(
    dsn=settings.SENTRY_DSN,
    traces_sample_rate=1.0,
    profiles_sample_rate=0.0,
    enable_tracing=True,
    send_default_pii=True,
    integrations=[
        FastApiIntegration(),
        HttpxIntegration(),
        AioHttpIntegration(),
        RedisIntegration(),
    ],
)

How to reproduce

  1. FastAPI app with sentry-sdk 2.45.0
  2. Run with uvicorn app:app --workers 10 (multiprocessing spawn)
  3. Send concurrent HTTP requests
  4. Workers eventually crash with SIGSEGV

Observations

  • Crash happens in spawned worker processes, not the main process
  • The corrupted pointer always points to Sentry SDK's _safe_repr_wrapper function or _PyFunction_Vectorcall
  • Workers that don't use Sentry SDK (different project, same infrastructure) don't crash
  • Same codebase works fine in single-process mode

We tried setting OPENBLAS_NUM_THREADS=1 and it did not help.

It appears that a function object created during Sentry serialization (_safe_repr_wrapper) is somehow being written to a location that should contain a HAMT node pointer.

This could be:

  • A use-after-free where HAMT memory is reused for function objects
  • A buffer overflow in serialization code
  • Race condition in scope/context handling with multiprocessing

Expected Result

No segfault

Actual Result

Eventual segfault

sysradium avatar Dec 03 '25 10:12 sysradium

PY-2002

linear[bot] avatar Dec 03 '25 10:12 linear[bot]

thx for the detailed report @sysradium

unfortunately, contextvars + multiprocessing is an old issue in the python ecosystem.

The only suggestion I have for now is try moving the sentry_sdk.init to the fastapi lifespan so that each worker gets fresh state related to Sentry. https://fastapi.tiangolo.com/advanced/events/#lifespan

from contextlib import asynccontextmanager
from fastapi import FastAPI
import sentry_sdk

@asynccontextmanager
async def lifespan(app: FastAPI):
    sentry_sdk.init(..)

app = FastAPI(lifespan=lifespan)

sl0thentr0py avatar Dec 10 '25 13:12 sl0thentr0py

@sl0thentr0py thanks for the explanation. I was thinking if making this change however was unsure if the fastapi integration will work correctly.

Are there any issues with initialising FastApiIntegration inside the lifespan?

Also just to mention the downgrade to 1.45.1 resolved the problem, but obviously we would love to upgrade to the latest v2 one.

sysradium avatar Dec 18 '25 15:12 sysradium