cpython icon indicating copy to clipboard operation
cpython copied to clipboard

Assertion errors/stale cache in typeobject.c when tp_version_tag is not cleared.

Open Yhg1s opened this issue 3 years ago • 5 comments

The simple method cache in typeobject.c uses the tp_version_tag field and the Py_TPFLAGS_VALID_VERSION_TAG bit in the tp_flags field to track changes on the type that might invalidate the cache. Unfortunately the code currently assumes that any code clearing the tp_flag also clears the tp_version_tag, and checks this assumption with an assert. Unfortunately, many SWIG-generated extension modules violate this assertion.

SWIG prior to 4.1.0 cleared the tp_flags bit without touching the tp_version_tag field (https://github.com/swig/swig/blob/v4.0.2/Lib/python/pyrun.swg#L1230). This was unintentionally fixed in 4.1.0 by using PyType_Modified() instead, but unfortunately there are a great many SWIG-generated extension modules checked in all over the place (e.g. https://github.com/OSGeo/gdal/blob/6bd07b20b3e55c2fc94da611244a615a4fd2991f/swig/python/extensions/osr_wrap.cpp#L2296) so we cannot assume tp_version_tag is changed when the tp_flag bit is cleared.

The end result is an assertion error when assertions are enabled (I would implore everyone to run tests with assertions enabled...), or potentially a stale cache entry when they are not. It's hard to gauge the likelihood of the stale cache entries. In the case of SWIG it's extremely unlikely, as SWIG merely sets the this attribute on the type, which is not likely to be cached or looked up with _PyType_Lookup. For other cases where extension modules clear the tp_flag bit without changing the tp_version_tag it might be more likely.

(I have a PR to fix this already.)

  • PR: gh-99294

Yhg1s avatar Nov 09 '22 15:11 Yhg1s

Two problems:

  • 3.11's specialised instructions only use the version as a guard. This can be fixed if we guard the flag.

  • If SWIG just reset the flag without calling PyType_Modified, thats a bug. PyType_Modified also invalidates the type version for all subclasses of a modified type. So the current SWIG code would return wrong cached values even in pre-3.10 for subclasses that inherit a SWIG C type. There is no easy way to fix this on CPython side. In fact I rather not fix it as errors should never pass silently. This can cause really obscure bugs if we allow it through.

TLDR this is a nasty bug. There is no complete fix for CPython that accounts for all the intricacies of the method cache. The only safe thing is for SWIG to update itself.

Fidget-Spinner avatar Nov 09 '22 17:11 Fidget-Spinner

I guess it still makes sense to "fix" the specialized instructions in 3.11 so that this always-broken SWIG code isn't more broken than it was pre-3.11.

I think for 3.12 we should consider just removing Py_TPFLAGS_VALID_VERSION_TAG entirely. It is both undocumented and entirely redundant. There's no good reason anyone should write code that uses/sets or checks it.

carljm avatar Nov 09 '22 18:11 carljm

Fixing SWIG doesn't fix existing extension modules, unfortunately. Many projects check in SWIG-generated code. Some even edit it afterwards. (I think I cried when I found that out.) Also, this isn't just about SWIG, as other extension modules can do the same kind of thing. The problem is they would work right in Python 3.9 but not in 3.10 -- and worse, it wouldn't be apparent that it's not working right, because in most cases it doesn't matter. I imagine the specialisations in 3.11 and later make it more likely for things to go subtly wrong, but again it'll be situational and probably somewhat unlikely to be detected (not to mention a dickens to debug).

Removing Py_TPFLAGS_VALID_VERSION_TAG is, somewhat ironically, a much easier change to deal with, because it is immediately a compile error, rather than a silently lurking bug. (We still have backward compatibility to contend with.)

Yhg1s avatar Nov 09 '22 23:11 Yhg1s

There is a deeper issue here. We should be rejecting extensions that are binary incompatible with the version of CPython that is loading them.

We should reject extensions compiled for a different version, unless they are using the stable ABI. I'm surprised we don't have a version number for this. It shouldn't be too hard to compile a version number from header files into the DLL. Or am I missing something?

We should definitely remove Py_TPFLAGS_VALID_VERSION_TAG for 3.12.

markshannon avatar Nov 10 '22 12:11 markshannon

We used to have version numbers in extension modules (and to some extent still do) but that's not relevant here. This is a source incompatibility. Rebuilding doesn't change anything.

Removing the tp_flag in 3.12 will require a PEP 387 exemption.

Yhg1s avatar Nov 10 '22 13:11 Yhg1s

I was looking into deprecating Py_TPFLAGS_VALID_VERSION_TAG, but there are no references to it in the docs.

IMO, that means that Py_TPFLAGS_VALID_VERSION_TAG is undefined and this is a SWIG bug, not a CPython one.

I don't think we need to remove the flag right now, but we should definitely documented that it shouldn't be used.

markshannon avatar Feb 09 '23 10:02 markshannon

I will reiterate that this assert triggers in real world code, and a fair bit of it. (We initially worked around it by fixing the code in our version of SWIG, but we ran into quite a bit of checked-in, and sometimes checked-in-and-hand-edited, SWIG code in third-party projects.) I do not think it is a good idea to invalidate that many SWIG-generated extension modules for the sake of skipping one bitflag check, especially considering SWIG wasn't fixed until October last year (and upgrading SWIG is not always easy).

Yhg1s avatar Feb 09 '23 10:02 Yhg1s