gh-117139: Convert the evaluation stack to stack refs
Only tags all pointers 0b11 and NULL and immortal stuff as deferred for now.
- Issue: gh-117139
Could we hold off on this until 3.14? It's only a week until feature freeze for 3.13 (at which point main becomes 3.14), and this looks like a lot of churn in a time where we all would like stability to merge things that are actually needed in 3.13.
Could we hold off on this until 3.14? It's only a week until feature freeze for 3.13 (at which point main becomes 3.14), and this looks like a lot of churn in a time where we all would like stability to merge things that are actually needed in 3.13.
Alright. I always forget that ten other people are rushing in things at the same time as me. I don't think this will add any value for CPython 3.13 anyways at the moment.
Could we hold off on this until 3.14?
That makes sense. I'll start providing feedback and reviewing this now, but it won't be merged in 3.13.
Adding a lot more deferred objects exposes refleaks in the call sequence again. Need to investigate.
Can we have an issue for this. The linked issue is just about the tagging scheme, but this goes much deeper.
Without an isssue, there is nowhere to discuss design or other higher level problems.
For now, I'll comment on this PR.
How are generators and coroutines to be handled?
If a deferred reference to an object is held in a suspended generator, and there are no other references to that object, what happens?
When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.
Can we have an issue for this. The linked issue is just about the tagging scheme, but this goes much deeper.
Please comment on that issue, or we can rename the title to be more inclusive of the current PR.
I have made the requested changes; please review again
Namely:
- Manually convert most of the bytecodes.c to stackref, instead of making it magically done by the cases generator.
- Fix ownership problems when using steal or borrow.
- Tag all pointers
0b11.
To be conservative, this PR only defers immortal objects at the moment and NULL.
Thanks for making the requested changes!
@markshannon: please review the changes made to this pull request.
Note: I'm aware that certain decref options don't follow the semantics laid out in the issue. For now they're still safe because they will never operate on deferred objects. In PEP 703's plan, (non-immortal) floats and ints will not be deferred. So we're safe for now. However, in the future if we want to defer everything, then those need to be removed https://github.com/python/cpython/issues/118930.
5% performance hit on single-threaded default build: https://github.com/faster-cpython/benchmarking-public/tree/main/results/bm-20240511-3.14.0a0-144a6fa
I really think we need a design document (in an issue) that we can discuss before doing more work on this. I will attempt to summarize what I believe to be the design at the moment:
- Values on "localsplus", that is the fast locals and evaluation stack, of frames may be deferred. This includes suspended generators, so that deferred references may occur in the heap without being present in the frame stack
- A reference can only be deferred if it references an object whose class is in a fixed set of "deferrable" classes.
- Because deferred references can exist in the heap alone, objects that can have deferred references to them can only collected by scanning the heap.
- Calls to
Py_Dealloccan cause a garbage collection and updatingstacktopacross all calls toPy_Deallocand any other call that would escape the interpreter is deemed too expensive. - This has the following consequences:
- Since the stack pointer may not visible to the GC it must scan the whole stack including the dead part.
- Unused stack items must be initialized to NULL, so that they don't point to invalid memory.
- It is OK to scan dead pointers to deferred objects, as only the GC can free those objects.
- I don't understand how non-deferred objects are freed, as we cannot safely look at non-deferred pointers.
I really think we need a design document (in an issue)...
I'll add a more detailed summary of the GC changes to:
- https://github.com/python/cpython/issues/117376
I don't understand how non-deferred objects are freed, as we cannot safely look at non-deferred pointers
We don't look at non-deferred pointers above stacktop (beyond checking the tag).
To be clear, what we are describing here is an extra step in the GC that scans each thread's stack and ensures that deferred referenced objects from thread's stacks are kept alive. These frames are mostly not tracked and would not otherwise be considered by the GC.
We don't need to do anything special in this step for non-deferred references because those objects are kept alive by their reference count (i.e., computed gc_refs > 0), just like they are today.
I've updated https://github.com/python/cpython/issues/117376.
- We should ensure that this is at least perf-neutral in the default build. If it's not, we should disable the tagged pointers.
It's definitely going to be slower. Last we measured it was 5% slowdown. I guess I'll disable it for the default build then (for now).
We should ensure that this is at least perf-neutral in the default build. If it's not, we should disable the tagged pointers.
I think I'm going to ask the Faster CPython team whether they want to land the PR with a 3-5% hit on default builds as-is, or ifdef it out and land with maybe a 1% perf hit (this would be due to fusing NULL into a normal pointer, thus causing dup and close to require xincref and xdecref on default builds).
No slowdown on default build: https://github.com/faster-cpython/benchmarking-public/tree/main/results/bm-20240604-3.14.0a0-6a6bae2
5% slowdown on free-threaded build: https://github.com/faster-cpython/benchmarking-public/tree/main/results/bm-20240604-3.14.0a0-6a6bae2-NOGIL
Only tags all pointers 0b11 and NULL and immortal stuff as deferred for now.
Hey @Fidget-Spinner, could you update the PR description to be less criptic? I have no idea what you meant with this sentence. This PR is large enough that I think at least the PR description (and perhaps the commit message) should have a careful description of what this PR is doing. (The issue feels more like a discussion than a description of a specific design.)
I see that you removed PyObject_Vectorcall_StackRef and the other call variants. Thanks for doing that.
Could you do the same for _PyEval_UnpackIterableStackRef, _PyDict_FromStackRefItems, _PyBuildSlice_ConsumeStackRefs, and _PyUnicode_JoinStackRef.
Could you also convert _PyList_FromStackSteal and _PyTuple_FromStackSteal back to taking PyObject **?
That way the changes aren't leaking into the Object folder which should only be concerned with heap references.
Alright. I'll do that except for _PyEval_UnpackIterableStackRef. Reason being it's entire job is to write to the stack... so it should be aware of the stack anyways.
Latest performance results for the default build suggest a 1% speedup (might just be noise) https://github.com/faster-cpython/benchmarking-public/tree/main/results/bm-20240608-3.14.0a0-78dbf36
When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.
I'm a bit worried about how many places we differ in what is the correct reference. E.g in a few places you have written Steal, and I think it should be Borrow, or vice-versa.
What checks can we add that will blow up if things are wrong?
I can think of one: For Steal variants: destroy the original reference.
I.e, in debug build:
#define _PyStackRef_FromPyObjectSteal(OP) \
_PyStackRef_FromPyObjectStealDebug(&(OP))
PyStackRef
_PyStackRef_FromPyObjectStealDebug(PyObject **obj)
{
PyStackRef result = _PyStackRef_FromPyObjectStealNonDebug(*obj);
*obj = (PyObject *)0xBADBADBADBADBAB;
return result;
}
Any ideas for other checks we can add?
I can think of one: For Steal variants: destroy the original reference. I.e, in debug build:
I tried to do that but it severely restricts uses of PyStackRef_FromPyObjectSteal and requires a lot more refactoring. For example, the following code is now invalid PyStackRef_FromPyObjectSteal(Py_NewRef(thing)) because it's not a valid lvalue to take an address from.
I'm not so sure what to do here.
I tried to do that but it severely restricts uses of
PyStackRef_FromPyObjectStealand requires a lot more refactoring.
That suggests to me that the current usage isn't correct, so the refactoring is necessary.
@markshannon I removed the type annotation because I noticed it's unsafe for _PyInterpreterFrame to be passed to be passed to these steal/dup/frompyobject functions. When tagging them, we check if they are immortal/deferred at the moment. Basically the API makes no sense for anything that is not a PyObject. Sam also wanted the manual conversion to PyStackRef to make it more salient that it's an unsafe operation.
I will address the rest of the reviews soon.
@markshannon I removed the type annotation because I noticed it's unsafe for
_PyInterpreterFrameto be passed to be passed to these steal/dup/frompyobject functions.
So don't tag them.
@markshannon I removed the type annotation because I noticed it's unsafe for
_PyInterpreterFrameto be passed to be passed to these steal/dup/frompyobject functions.So don't tag them.
Yeah but the code generator still emits a cast because it thinks it's trying to cast from PyObject, which the C compiler breaks and doesn't like.
Yeah but the code generator still emits a cast because it thinks it's trying to cast from PyObject, which the C compiler breaks and doesn't like.
Then fix the code generator 🙂
I think you only need to add .bits after stack_pointer[0] and the cast is valid.