`tp_doc` switch from `PyObject_Malloc` to `PyMem_Malloc` is not backwards compatible
Bug report
In https://github.com/python/cpython/pull/114574 we switched a number of non-PyObject allocations from PyObject_Malloc to PyMem_Malloc, including tp_doc on PyHeapTypeObjects.
Unfortunately, this isn't backwards compatible because C-API extensions may allocate tp_doc contents, which are then freed by CPython in type_dealloc. For example, pybind11 allocates memory for the docstring using PyObject_MALLOC. This leads to crashes when using pybind11 in debug builds of Python 3.13: the allocation uses PyObject_MALLOC, but the memory is freed using PyMem_Free.
We should consider reverting the change to tp_doc and figure out a way to allocate the doc in a way that's both safe (in the free-threaded build) and doesn't break backwards compatibility (in the default build).
Some example extensions:
Uses PyObject_Malloc
Uses strdup
We don't document the tp_doc behavior so some extensions use strdup, which works fine in release builds (and is thread-safe in the free-threaded build), but probably crashes in debug builds of CPython.
cc @erlend-aasland
Some possible ways to address this:
-
Require C API extensions to use
PyMem_Mallocfortp_docon heap types going forward. Not ideal because it's a non-backwards compatible change. -
Revert and require C API extensions to use
PyObject_Mallocfortp_docon heap types. Not ideal because it's not thread-safe in the free-threaded build. -
Recommend using
PyMem_Malloc, but allow extensions to use eitherPyObject_MallocorPyMem_Mallocby adding special logic when we freetp_docinternally. It might not be possible to fully detect which allocation method was used if the allocator was overridden, but we can handle the common cases and fall back toPyObject_Freefor backwards compatibility. -
Add a new API, e.g.,
PyMem_DocMalloc/PyMem_DocFreethat we recommend. In the default build, it can be implemented as calls toPyObject_Malloc/Freefor backwards compatibility. In the free-threaded build, we can do extra work at runtime to make the calls thread-safe.
My inclination would be (3).
I would prefer 1) but 3) would probably be nicer. Definitely not 2) nor 4). If we go for 3), it should be a temporary workaround coupled with docs that communicate which allocator to use, and that the fallback/workaround will be removed in a future Python version. IMO.