cpython icon indicating copy to clipboard operation
cpython copied to clipboard

gdb `py-bt` is no more compatible with Python 3.13+

Open cielavenir opened this issue 2 months ago • 18 comments

Bug report

Bug description:

gdb ./python
r
Ctrl-C
py-bt
(gdb) py-bt
Traceback (most recent call first):
  (unable to read python frame information)

3.14.0b1 is affected and 3.14.0a7 is not affected.

Also I bisected git and found 1f5682f3a27516833f7c317707dd359280dba6e7. And reverting it onto 3.14.0 fixes py-bt:

(gdb) py-bt
Traceback (most recent call first):
  File "/home/user/devel/cpython/Lib/_pyrepl/unix_console.py", line 439, in wait
    or bool(self.pollob.poll(timeout))
  File "/home/user/devel/cpython/Lib/_pyrepl/reader.py", line 703, in handle1
    self.console.wait(100)
  File "/home/user/devel/cpython/Lib/_pyrepl/reader.py", line 748, in readline
    self.handle1()
  File "/home/user/devel/cpython/Lib/_pyrepl/readline.py", line 395, in multiline_input
    return reader.readline()
  File "/home/user/devel/cpython/Lib/_pyrepl/simple_interact.py", line 143, in run_multiline_interactive_console
    statement = multiline_input(more_lines, ps1, ps2)
  File "/home/user/devel/cpython/Lib/_pyrepl/main.py", line 58, in interactive_console
    run_multiline_interactive_console(console)

This can be confirmed by building https://github.com/cielavenir/cpython/commits/3.14.0-pybt .

However I don't know why "reenabling autovectorization" fixes py-bt.

There is a similar report https://github.com/python/cpython/issues/127147 but this issue happens without --enable-optimizations.

Tested on Debian 13, GCC 14

CPython versions tested on:

3.14

Operating systems tested on:

Linux

Linked PRs

  • gh-142941
  • gh-143371

cielavenir avatar Nov 29 '25 16:11 cielavenir

(let me repost in this thread)

on 3.13, the affected commit is https://github.com/python/cpython/commit/5646f6f73964a85eaf4757cb4c89eacb00a1670d and you can try https://github.com/cielavenir/cpython/tree/3.13.9-pybt

official version 3.13.3 is affected but 3.13.2 is not affected.

cielavenir avatar Nov 29 '25 16:11 cielavenir

I was able to reproduce this, but it's not clear to me what we should do about it. py-bt is failing because gcc is able to optimize the interpreter loop more effectively, which results in frame being optimized out:

https://github.com/python/cpython/blob/db098a475a47b16d25c88d95dbcf0c6572c68576/Tools/gdb/libpython.py#L1802-L1805

I don't think the current implementation of py-bt will work reliably when optimizations are enabled (we compile with -O3 when running ./configure without any additional arguments).

A couple of options come to mind:

  1. Use stronger language recommending that the debug build be used. It's only recommended right now. py-bt works fine for me on debian 13/gcc 14 when Python is configured with --with-pydebug.
  2. Modify py-bt so that it uses a different approach to get the top-most frame. I think we might be able to read the current thread state from TLS and then grab the top-most frame from the thread state. That shouldn't be susceptible to compiler optimizations.

mpage avatar Nov 30 '25 00:11 mpage

The thing is, we want RelWithDebInfo flavor of build...

we might be able to read the current thread state from TLS

I already tried this but it did not work

    def get_pyop(self):
        try:
            frame = self._gdbframe.read_var('frame')
            frame = PyFramePtr(frame)
            if not frame.is_optimized_out():
                return frame
            tstate = self._gdbframe.read_var('tstate')
            if tstate is None:
                return None
            frame = PyFramePtr(tstate.dereference()["current_frame"])
            if frame and not frame.is_optimized_out():
                return frame
            return None
        except ValueError:
            raise # return None

cielavenir avatar Dec 01 '25 06:12 cielavenir

I already tried this but it did not work

What you have is reading tstate from the stack frame of the interpreter loop (and will potentially be affected in the same way as reading frame). I was suggesting reading the thread state from TLS, since that should be unaffected by compiler optimizations. Something like the following appears to work for me:

    def get_pyop(self):
        try:
            frame = self._gdbframe.read_var('frame')
            frame = PyFramePtr(frame)
            if not frame.is_optimized_out():
                return frame
            gilstate = self._gdbframe.read_var('_Py_tss_gilstate')
            if gilstate is None:
                return None
            frame = PyFramePtr(gilstate.dereference()["current_frame"])
            if frame:
                return frame
            return None
        except ValueError:
            return None

_Py_tss_gilstate is a TLS variable that stores the Python thread state associated with the current OS thread:

https://github.com/python/cpython/blob/eb892868b31322d7cf271bc25923e14b1f67ae38/Python/pystate.c#L75-L77

mpage avatar Dec 02 '25 01:12 mpage

@mpage Thank you but my test failed, _Py_tss_gilstate seems py3.15 only?

edit: _Py_tss_tstate did not work either

cielavenir avatar Dec 02 '25 01:12 cielavenir

b8998fe2d8249565bf30ce6075ed678e1643f2a4 could be cherry-picked onto 3.14.0 without conflict, and your gdb update works, so I pushed https://github.com/cielavenir/cpython/tree/3.14.0-updategdb for reference

edit: But I never know if such python build can load binary-wheel modules built for existing libpython3.14

cielavenir avatar Dec 02 '25 02:12 cielavenir

However b8998fe2d8249565bf30ce6075ed678e1643f2a4 cannot be cherry-picked onto 3.13.9 as it causes conflict.

cielavenir avatar Dec 02 '25 02:12 cielavenir

(meanwhile it seems like CFLAGS_NODIST="-O2" ./configure works, though it cannot be production binary)

cielavenir avatar Dec 02 '25 06:12 cielavenir

You should not expect python-gdb.py to work on a release build which is highly optimized (PGO, LTO, -O3). In my experience, gdb is only fully reliable when Python is built without any optimization: with -O0. So we build Python in debug mode with -O0 in Fedora and RHEL: https://developers.redhat.com/articles/2021/09/08/debugging-python-c-extensions-gdb

--with-pydebug uses -Og which still cause issues when debugging Python in gdb.

I don't think that we can do anything about this issue and I even suggest closing the issue.

vstinner avatar Dec 02 '25 16:12 vstinner

How do you think of https://github.com/python/cpython/issues/127147 then

edit: it says py-bt was working on 3.12

cielavenir avatar Dec 03 '25 00:12 cielavenir

I think it's worth making py-bt work with -Og builds and release builds. That would save me a lot of time debugging crashes. Sometimes I get a coredump from a release or --with-pydebug build that's hard to reproduce. I end up having to manually examine _Py_tss_tstate or _Py_tss_gilstate to try to reconstruct the Python stack, which is painful and slow.

Using _Py_tss_gilstate in 3.15 seems relatively straightforward.

colesbury avatar Dec 04 '25 19:12 colesbury

I end up having to manually examine _Py_tss_tstate or _Py_tss_gilstate to try to reconstruct the Python stack

Would you mind to show some gdb examples on how you inspect _Py_tss_tstate or _Py_tss_gilstate? Do you start inspecting _Py_tss_gilstate->current_frame? _Py_tss_tstate is NULL if the GIL is released.

vstinner avatar Dec 17 '25 16:12 vstinner

Yes, I look at _Py_tss_gilstate->current_frame (or _Py_tss_tstate in older versions when the GIL is not released). It's usually something like:

 p ((PyCodeObject*)(_Py_tss_gilstate->current_frame.f_executable.bits & ~1))->co_name
 p ((PyCodeObject*)(_Py_tss_gilstate->current_frame->previous.f_executable.bits & ~1))->co_name
...

colesbury avatar Dec 17 '25 16:12 colesbury

I think it's worth making py-bt work with -Og builds and release builds. That would save me a lot of time debugging crashes. Sometimes I get a coredump from a release or --with-pydebug build that's hard to reproduce. I end up having to manually examine _Py_tss_tstate or _Py_tss_gilstate to try to reconstruct the Python stack, which is painful and slow.

I wrote the PR https://github.com/python/cpython/pull/142941 for that.

vstinner avatar Dec 18 '25 14:12 vstinner

@vstinner - thanks! I adapted your PR and merged the logic back into the py-bt and py-bt-full commands:

  • https://github.com/python/cpython/pull/143371

Still doing some more testing, but it seems to work in optimized builds as long as you don't strip the debug info out.

colesbury avatar Jan 02 '26 19:01 colesbury

@colesbury thank you but this gdb.py did not work for me (it works when I apply https://github.com/cielavenir/cpython/commit/21b837798dee991c7728b848855991052619bed8 but the original issue is still there). What python versions will be compatible?

cielavenir avatar Jan 03 '26 18:01 cielavenir

Btw I see this (with or without my hack patch)

(gdb)  p ((PyCodeObject*)(_Py_tss_tstate->current_frame.f_executable.bits & ~1))->co_name
Cannot access memory at address 0x48

cielavenir avatar Jan 03 '26 18:01 cielavenir

@cielavenir please try the latest version of the PR. It should work with 3.13+

colesbury avatar Jan 05 '26 17:01 colesbury