cpython icon indicating copy to clipboard operation
cpython copied to clipboard

gh-117139: Convert the evaluation stack to stack refs

Open Fidget-Spinner opened this issue 1 year ago • 12 comments

Only tags all pointers 0b11 and NULL and immortal stuff as deferred for now.

  • Issue: gh-117139

Fidget-Spinner avatar Apr 30 '24 21:04 Fidget-Spinner

Could we hold off on this until 3.14? It's only a week until feature freeze for 3.13 (at which point main becomes 3.14), and this looks like a lot of churn in a time where we all would like stability to merge things that are actually needed in 3.13.

gvanrossum avatar Apr 30 '24 21:04 gvanrossum

Could we hold off on this until 3.14? It's only a week until feature freeze for 3.13 (at which point main becomes 3.14), and this looks like a lot of churn in a time where we all would like stability to merge things that are actually needed in 3.13.

Alright. I always forget that ten other people are rushing in things at the same time as me. I don't think this will add any value for CPython 3.13 anyways at the moment.

Fidget-Spinner avatar Apr 30 '24 21:04 Fidget-Spinner

Could we hold off on this until 3.14?

That makes sense. I'll start providing feedback and reviewing this now, but it won't be merged in 3.13.

colesbury avatar Apr 30 '24 21:04 colesbury

Adding a lot more deferred objects exposes refleaks in the call sequence again. Need to investigate.

Fidget-Spinner avatar May 02 '24 21:05 Fidget-Spinner

Can we have an issue for this. The linked issue is just about the tagging scheme, but this goes much deeper.

Without an isssue, there is nowhere to discuss design or other higher level problems.

For now, I'll comment on this PR.

markshannon avatar May 09 '24 08:05 markshannon

How are generators and coroutines to be handled?

If a deferred reference to an object is held in a suspended generator, and there are no other references to that object, what happens?

markshannon avatar May 09 '24 08:05 markshannon

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

bedevere-app[bot] avatar May 09 '24 10:05 bedevere-app[bot]

Can we have an issue for this. The linked issue is just about the tagging scheme, but this goes much deeper.

Please comment on that issue, or we can rename the title to be more inclusive of the current PR.

Fidget-Spinner avatar May 09 '24 14:05 Fidget-Spinner

I have made the requested changes; please review again

Namely:

  1. Manually convert most of the bytecodes.c to stackref, instead of making it magically done by the cases generator.
  2. Fix ownership problems when using steal or borrow.
  3. Tag all pointers 0b11.

To be conservative, this PR only defers immortal objects at the moment and NULL.

Fidget-Spinner avatar May 10 '24 18:05 Fidget-Spinner

Thanks for making the requested changes!

@markshannon: please review the changes made to this pull request.

bedevere-app[bot] avatar May 10 '24 18:05 bedevere-app[bot]

Note: I'm aware that certain decref options don't follow the semantics laid out in the issue. For now they're still safe because they will never operate on deferred objects. In PEP 703's plan, (non-immortal) floats and ints will not be deferred. So we're safe for now. However, in the future if we want to defer everything, then those need to be removed https://github.com/python/cpython/issues/118930.

Fidget-Spinner avatar May 10 '24 21:05 Fidget-Spinner

5% performance hit on single-threaded default build: https://github.com/faster-cpython/benchmarking-public/tree/main/results/bm-20240511-3.14.0a0-144a6fa

Fidget-Spinner avatar May 12 '24 00:05 Fidget-Spinner

I really think we need a design document (in an issue) that we can discuss before doing more work on this. I will attempt to summarize what I believe to be the design at the moment:

  • Values on "localsplus", that is the fast locals and evaluation stack, of frames may be deferred. This includes suspended generators, so that deferred references may occur in the heap without being present in the frame stack
  • A reference can only be deferred if it references an object whose class is in a fixed set of "deferrable" classes.
  • Because deferred references can exist in the heap alone, objects that can have deferred references to them can only collected by scanning the heap.
  • Calls to Py_Dealloc can cause a garbage collection and updating stacktop across all calls to Py_Dealloc and any other call that would escape the interpreter is deemed too expensive.
  • This has the following consequences:
    • Since the stack pointer may not visible to the GC it must scan the whole stack including the dead part.
    • Unused stack items must be initialized to NULL, so that they don't point to invalid memory.
    • It is OK to scan dead pointers to deferred objects, as only the GC can free those objects.
    • I don't understand how non-deferred objects are freed, as we cannot safely look at non-deferred pointers.

markshannon avatar May 13 '24 10:05 markshannon

I really think we need a design document (in an issue)...

I'll add a more detailed summary of the GC changes to:

  • https://github.com/python/cpython/issues/117376

I don't understand how non-deferred objects are freed, as we cannot safely look at non-deferred pointers

We don't look at non-deferred pointers above stacktop (beyond checking the tag).

To be clear, what we are describing here is an extra step in the GC that scans each thread's stack and ensures that deferred referenced objects from thread's stacks are kept alive. These frames are mostly not tracked and would not otherwise be considered by the GC.

We don't need to do anything special in this step for non-deferred references because those objects are kept alive by their reference count (i.e., computed gc_refs > 0), just like they are today.

colesbury avatar May 13 '24 15:05 colesbury

I've updated https://github.com/python/cpython/issues/117376.

colesbury avatar May 13 '24 16:05 colesbury

  1. We should ensure that this is at least perf-neutral in the default build. If it's not, we should disable the tagged pointers.

It's definitely going to be slower. Last we measured it was 5% slowdown. I guess I'll disable it for the default build then (for now).

Fidget-Spinner avatar Jun 04 '24 07:06 Fidget-Spinner

We should ensure that this is at least perf-neutral in the default build. If it's not, we should disable the tagged pointers.

I think I'm going to ask the Faster CPython team whether they want to land the PR with a 3-5% hit on default builds as-is, or ifdef it out and land with maybe a 1% perf hit (this would be due to fusing NULL into a normal pointer, thus causing dup and close to require xincref and xdecref on default builds).

Fidget-Spinner avatar Jun 04 '24 07:06 Fidget-Spinner

No slowdown on default build: https://github.com/faster-cpython/benchmarking-public/tree/main/results/bm-20240604-3.14.0a0-6a6bae2

5% slowdown on free-threaded build: https://github.com/faster-cpython/benchmarking-public/tree/main/results/bm-20240604-3.14.0a0-6a6bae2-NOGIL

Fidget-Spinner avatar Jun 05 '24 11:06 Fidget-Spinner

Only tags all pointers 0b11 and NULL and immortal stuff as deferred for now.

Hey @Fidget-Spinner, could you update the PR description to be less criptic? I have no idea what you meant with this sentence. This PR is large enough that I think at least the PR description (and perhaps the commit message) should have a careful description of what this PR is doing. (The issue feels more like a discussion than a description of a specific design.)

gvanrossum avatar Jun 05 '24 16:06 gvanrossum

I see that you removed PyObject_Vectorcall_StackRef and the other call variants. Thanks for doing that. Could you do the same for _PyEval_UnpackIterableStackRef, _PyDict_FromStackRefItems, _PyBuildSlice_ConsumeStackRefs, and _PyUnicode_JoinStackRef. Could you also convert _PyList_FromStackSteal and _PyTuple_FromStackSteal back to taking PyObject **? That way the changes aren't leaking into the Object folder which should only be concerned with heap references.

markshannon avatar Jun 07 '24 16:06 markshannon

Alright. I'll do that except for _PyEval_UnpackIterableStackRef. Reason being it's entire job is to write to the stack... so it should be aware of the stack anyways.

Fidget-Spinner avatar Jun 07 '24 18:06 Fidget-Spinner

Latest performance results for the default build suggest a 1% speedup (might just be noise) https://github.com/faster-cpython/benchmarking-public/tree/main/results/bm-20240608-3.14.0a0-78dbf36

Fidget-Spinner avatar Jun 08 '24 02:06 Fidget-Spinner

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

bedevere-app[bot] avatar Jun 11 '24 13:06 bedevere-app[bot]

I'm a bit worried about how many places we differ in what is the correct reference. E.g in a few places you have written Steal, and I think it should be Borrow, or vice-versa.

What checks can we add that will blow up if things are wrong?

I can think of one: For Steal variants: destroy the original reference. I.e, in debug build:

#define _PyStackRef_FromPyObjectSteal(OP) \
    _PyStackRef_FromPyObjectStealDebug(&(OP))

PyStackRef
_PyStackRef_FromPyObjectStealDebug(PyObject **obj)
{
    PyStackRef result = _PyStackRef_FromPyObjectStealNonDebug(*obj);
    *obj = (PyObject *)0xBADBADBADBADBAB;
    return result;
}

Any ideas for other checks we can add?

markshannon avatar Jun 11 '24 13:06 markshannon

I can think of one: For Steal variants: destroy the original reference. I.e, in debug build:

I tried to do that but it severely restricts uses of PyStackRef_FromPyObjectSteal and requires a lot more refactoring. For example, the following code is now invalid PyStackRef_FromPyObjectSteal(Py_NewRef(thing)) because it's not a valid lvalue to take an address from.

I'm not so sure what to do here.

Fidget-Spinner avatar Jun 11 '24 14:06 Fidget-Spinner

I tried to do that but it severely restricts uses of PyStackRef_FromPyObjectSteal and requires a lot more refactoring.

That suggests to me that the current usage isn't correct, so the refactoring is necessary.

markshannon avatar Jun 12 '24 09:06 markshannon

@markshannon I removed the type annotation because I noticed it's unsafe for _PyInterpreterFrame to be passed to be passed to these steal/dup/frompyobject functions. When tagging them, we check if they are immortal/deferred at the moment. Basically the API makes no sense for anything that is not a PyObject. Sam also wanted the manual conversion to PyStackRef to make it more salient that it's an unsafe operation.

I will address the rest of the reviews soon.

Fidget-Spinner avatar Jun 13 '24 14:06 Fidget-Spinner

@markshannon I removed the type annotation because I noticed it's unsafe for _PyInterpreterFrame to be passed to be passed to these steal/dup/frompyobject functions.

So don't tag them.

markshannon avatar Jun 13 '24 16:06 markshannon

@markshannon I removed the type annotation because I noticed it's unsafe for _PyInterpreterFrame to be passed to be passed to these steal/dup/frompyobject functions.

So don't tag them.

Yeah but the code generator still emits a cast because it thinks it's trying to cast from PyObject, which the C compiler breaks and doesn't like.

Fidget-Spinner avatar Jun 13 '24 16:06 Fidget-Spinner

Yeah but the code generator still emits a cast because it thinks it's trying to cast from PyObject, which the C compiler breaks and doesn't like.

Then fix the code generator 🙂

I think you only need to add .bits after stack_pointer[0] and the cast is valid.

markshannon avatar Jun 13 '24 16:06 markshannon