drgn icon indicating copy to clipboard operation
drgn copied to clipboard

Provide source code info for Symbol

Open PaulZ-98 opened this issue 3 years ago • 4 comments

For function symbols, provide the source code file, optional line number, and optional column. This is useful for noting the source code file/line/col when disassembling a function. In the code, look up the module by the function Symbol address, and use the module to help get the line number.

root@ub20-sdb:~/drgn_dev/drgn# drgn
drgn 0.0.19+2.g57320ac.dirty (using Python 3.8.10, elfutils 0.176, with libkdumpfile)
For help, type help(drgn).
>>> import drgn
>>> from drgn import NULL, Object, cast, container_of, execscript, offsetof, reinterpret, sizeof
>>> from drgn.helpers.linux import *
>>> 
>>> prog.symbol('mutex_lock').source(offset=0, line_number=True, line_column=False)
'/build/linux-hwe-5.8-Ox7OXf/linux-hwe-5.8-5.8.0/kernel/locking/mutex.c: 280'
>>> prog.symbol('mutex_lock').source(offset=42, line_number=True, line_column=False)
'/build/linux-hwe-5.8-Ox7OXf/linux-hwe-5.8-5.8.0/kernel/locking/mutex.c: 284'
>>> 

First drgn PR, so was unsure how to develop a test for this... let me know. Also how to create drgn_Symbol_source_DOC instead of putting in a string directly? Thanks!

PaulZ-98 avatar Jun 09 '22 15:06 PaulZ-98

Hi @PaulZ-98, I just wanted to provide some feedback on this. I'm not a maintainer so Omar's opinion will matter more here, but he may not be able to get back to you for a week or two.

First, this is a great idea, thanks for contributing it!

Second, regarding your question:

Also how to create drgn_Symbol_source_DOC instead of putting in a string directly?

These are generated from the _drgn.pyi stubs file directly. (See the script docs/exts/drgndoc/docstrings.py). You've already added the docstring to the stub file, so you should be able to just use drgn_Symbol_source_DOC directly, and it will be generated. If not, try doing a clean build.

Third, I have a few notes regarding the approach.

  • The stack frame object provides this information already. I know that it can't satisfy all the use cases that this interface could, since stack frames are limited to whatever is currently on some thread's stack. But I think the API is a bit better - it simply returns a tuple of (filename, line, column), and leaves the string formatting up to the user. It would be more consistent if your API ultimately returned the same tuple, and it would also ensure that users wouldn't need to parse the string returned by Symbol.source() if they did care about the column or line and wanted them as integers.
  • I'm not sure that this interface belongs with the Symbol object. Symbols may be a lot of things - code and data, which should have source information, but also untyped linker symbols or section names, or a variety of other stuff that doesn't really relate to a line in a source file. Symbols also have a lot of unpleasantness: namely, they aren't unique! The same symbol name can appear multiple times, and the same address can be contained by multiple symbols (thus the Prog.symbols() API). This does really happen due to static inline functions, or due to name collisions for static functions in different compilation units. Imagine you have an address and you'd like to get some source info about it. First, you'd need to create a symbol and deal with all of the above, including the possibility that there's no symbol containing that address -- something which seems distinctly possible to me. It seems that it would be better if we just had some sort of method prog.source_info(address), which would take a virtual address and return the tuple mentioned above. It would still be easy to use this API if you already have a Symbol, but it would prevent you from needing to interact with Symbols if you don't need to.

I'll also put a few code review notes in the PR itself, which should help regardless of whether Omar agrees with me on the approach.

Again, thanks for this and I hope to see it merged :)

brenns10 avatar Jun 27 '22 20:06 brenns10

Thanks for reviewing @brenns10 I will go through your comments and start making changes.

PaulZ-98 avatar Jun 28 '22 16:06 PaulZ-98

I think I've covered most of your comments @brenns10. I can always move the code out of Symbol if that's what we decide to do.

PaulZ-98 avatar Jul 14 '22 15:07 PaulZ-98

I'd love to have something like prog.source_info(address) implemented.

My use-case is crush, where I pair drgn with capstone to have a nice disassembler (see disassemble.py), and the only thing I miss now is translating arbitrary virtual address into a source code info.

pfactum avatar Sep 22 '22 20:09 pfactum