dtrace-utils icon indicating copy to clipboard operation
dtrace-utils copied to clipboard

Using debuginfo for better backtraces

Open thesamesam opened this issue 1 year ago • 7 comments

This is maybe a better example of the kind of thing I was talking about in https://github.com/oracle/dtrace-utils/issues/84.

With splitdebug (-ggdb3 but in /usr/lib/debug and stripped less), ustack() output is not super friendly:

$ sudo dtrace -n 'syscall::fsync*:return,syscall::sync*:return { ustack(); }'
[...]
  6 119674                     fsync:return
              libc.so.6`fsync+0x10
              less`0x5ea421eafd4d
              0x5ea421eb85bd
              0x5ea421eafa30
              0x7796ac9e5407
              0x7ffc1f5cecc3
              0x2f65686361632f72

In this case, I genuinely didn't know that less would ever call fsync, so I was curious as to where from! But the backtrace isn't so helpful there.

I get better output if I disable stripping and use -fno-omit-frame-pointer:

$ sudo dtrace -n 'syscall::fsync*:return,syscall::sync*:return { ustack(); }'
 25 119674                     fsync:return
              libc.so.6`fsync+0x10
              less`quit+0x5d
              less`commands+0x83d
              0x59897f425a30
              0x78803df45407
              0x7ffd79035cc3
              0x2f65686361632f72

It's not perfect, but it's more than enough for me to pin down what's going on.

Could DTrace learn to read DWARF (elfutils should be able to do this, including understanding splitdebug and so on) for backtraces?

thesamesam avatar Aug 28 '24 02:08 thesamesam

We certainly can look at it being an optional support - if debuginfo is available it would make sense to make use of it if it does not negatively impact trace processing. Anything that improves backtraces while not adding to the runtime dependencies in general is good.

kvanhees avatar Aug 28 '24 16:08 kvanhees

There are two distinct issues here: DTrace wants backtrace info for reliable stack traces (which has to be something the kernel can understand --hopefully, in the future, sframe will do here), and DTrace's userspace wants a symbol table for symbol lookups. Even the latter is only going to work for longer-running traces where the process hasn't already died before userspace gets its hands on the trace, but even then this is troublesome for main programs which are routinely stripped. Solaris implemented an .ldynsym section for just this, but the Linux approach seems to have been quite different: a section containing a compressed ELF executable (!!) which only has symbol table sections in it. We do not yet handle this crazy thing, and in my last trials relatively few binaries were built with it at all. We do need a symtab from somewhere.

I'd be happy to add some sort of symbol server support, but I don't think Linux has any such thing either...

nickalcock avatar Aug 30 '24 16:08 nickalcock

a section containing a compressed ELF executable

I'm pretty sure this is MiniDebugInfo (.gnu_debugdata). It looks like only Fedora ships with it by default (?) but I'd be open to us doing it in Gentoo.

One question is if we want to try lead some standardisation of making it a proper compressed section or not. But that would delay things substantially.

I'd be happy to add some sort of symbol server support, but I don't think Linux has any such thing either...

Isn't that debuginfod? What am I missing?

thesamesam avatar Aug 30 '24 23:08 thesamesam

It's debuginfod, but dtrace doesn't know how to request symbol info from there...

nickalcock avatar Oct 11 '24 16:10 nickalcock

The conclusion at Cauldron wrt standardising MiniDebugInfo from people I spoke to was basically "you can if you want, but I wouldn't worry that much over it" and that the only real thing to do there is improve find-debuginfo.sh from debugedit so that Fedora and Gentoo are using the same tooling. I still need to decide if we want to investigate adopting it more on our side.

thesamesam avatar Oct 11 '24 17:10 thesamesam

While we're at it let's fix things so that find-debuginfo.sh doesn't strip out the CTF...

nickalcock avatar Nov 06 '24 18:11 nickalcock

Some more references:

  • https://github.com/bpftrace/bpftrace/issues/1006
  • https://github.com/bpftrace/bpftrace/issues/1744
  • https://sourceware.org/systemtap/SystemTap_Beginners_Guide/ustack.html (stap can do this already but surely only with its non-BPF mode)

We discussed this a bit on IRC the other day. Having SFrame support in the kernel will help a lot (as it should then be visible via BPF), but I still think the DWARF parsing via elfutils is going to be needed on the userland side.

thesamesam avatar Jul 06 '25 21:07 thesamesam