cpython icon indicating copy to clipboard operation
cpython copied to clipboard

gh-143123: Protect against recursive tracer calls/finalization

Open Fidget-Spinner opened this issue 1 month ago • 9 comments

  • Issue: gh-143123

Fidget-Spinner avatar Dec 23 '25 23:12 Fidget-Spinner

Can we have a test for this?

markshannon avatar Dec 24 '25 10:12 markshannon

Can we have a test for this?

The reproducer provided by the OP requires installation of a package to crash. I don't think we can unit test this if that's the case.

Fidget-Spinner avatar Dec 24 '25 10:12 Fidget-Spinner

Is the problem that we are unable to distinguish between tracing happening in an invocation of _PyEval_EvalFrameDefault higher up the call stack and tracing happening in this invocation of _PyEval_EvalFrameDefault?

Yes I suspect that's the case.

Fidget-Spinner avatar Dec 24 '25 10:12 Fidget-Spinner

@Fidget-Spinner We could add a "jit_depth" field to the thread state, initialized to 0.

  • On entering _PyEval_EvalFrameDefault: if jit_depth is non-zero, increase it by one.
  • On leaving: if jit_depth is non-zero, decrease it by one
  • Don't start tracing if jit_depth is non-zero.
  • When starting tracing, set it to 1.
  • Tracing is active if jit _depth == 1
  • Assert that it is 1 anywhere we call the optimizer, start or stop tracing, etc.

markshannon avatar Dec 24 '25 11:12 markshannon

I realised we can just do the check in _PyJit_InitializeTracing, and leave the invariants to the rest of the optimizer.

Fidget-Spinner avatar Dec 25 '25 00:12 Fidget-Spinner

So I spent 30 minutes trying to create a simple reproducer to no avail. The following needs to happen:

  1. Outer Python -> C -> Inner Python calls.
  2. That just so happen to all be tracing at the same time.
  3. Outer traces, inner traces. Inner trace stomp on outer.

This is really hard to create a reproducer for as I'm not sure how to finangle the counters to be just right.

Fidget-Spinner avatar Dec 29 '25 02:12 Fidget-Spinner

Let's not worry about a reproducer, but ensure that tracing is active only once in any thread. Then we don't need to worry about "inner" or "outer" traces, as there will only be (at most) one trace.

markshannon avatar Jan 05 '26 12:01 markshannon

Let's not worry about a reproducer, but ensure that tracing is active only once in any thread. Then we don't need to worry about "inner" or "outer" traces, as there will only be (at most) one trace.

This has nothing to do with threads. It's recursive traces.

Fidget-Spinner avatar Jan 05 '26 13:01 Fidget-Spinner

Not once across all threads. Once per thread.

markshannon avatar Jan 05 '26 16:01 markshannon

Well, this PR does that. It fails if it detects another trace occuring during trace initialization.

Fidget-Spinner avatar Jan 07 '26 11:01 Fidget-Spinner

How do you detect when to stop tracing? If an exception is raised and tracing is active, but higher up the stack, how do you know not to stop tracing?

markshannon avatar Jan 08 '26 12:01 markshannon

How do you detect when to stop tracing? If an exception is raised and tracing is active, but higher up the stack, how do you know not to stop tracing?

It's protected by this https://github.com/python/cpython/blob/main/Python/optimizer.c#L720

Fidget-Spinner avatar Jan 08 '26 13:01 Fidget-Spinner

Maybe, but the current approach seems hard to follow and fragile. I'm fairly sure we don't always stop tracing cleanly when exceptions are raised.

I think we need a clear way to express the two queries:

  1. "is tracing happening in this invocation of PyEval_EvalFrame?", and
  2. "is tracing happening in this thread?"

And add some asserts that (1) always implies (2).

We should only start tracing if (2) is false, and only stop tracing if (1) is true.

I think we already have these two queries:

  1. IS_JIT_TRACING()
  2. tstate->jit_tracer_state.initial_state.code != NULL

We should probably add a is_tracing field to the thread state for clarity instead of the very unclear tstate->jit_tracer_state.initial_state.code != NULL

markshannon avatar Jan 09 '26 10:01 markshannon

@markshannon I updated to use a dedicated field to indicate tracing

Fidget-Spinner avatar Jan 09 '26 16:01 Fidget-Spinner

@markshannon I found a repro where not stopping early during exceptions actually cause issues https://github.com/python/cpython/issues/143358.

Added code to stop early when an exception is raised.

Fidget-Spinner avatar Jan 12 '26 00:01 Fidget-Spinner

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

bedevere-app[bot] avatar Jan 13 '26 10:01 bedevere-app[bot]