drgn icon indicating copy to clipboard operation
drgn copied to clipboard

[Draft] Kallsyms Symbol Finder

Open brenns10 opened this issue 2 years ago • 0 comments

Now that #241 is no longer a draft, I'm putting the next branch that builds upon it here for easy review. Unfortunately I can't set the base branch to be my own symbol_finder branch, so the PR currently includes the changes from #241 as well.

This branch allows the built-in kallsyms information to be used as a symbol table. For best results, it should be used with CONFIG_KALLSYMS_ALL. This only provides symbols for the kernel: no modules. There are two ways to support this:

  1. For live systems with root permissions, we can directly parse the text contents of /proc/kallsyms. This works on practically any kernel version!
  2. For vmcores (or live systems where /proc/kcore is unavailable, maybe due to permissions, see #347), we can parse the data structures that contain the kallsyms info. This requires some upstream changes which were merged back in v6.0, which add symbol information into the vmcoreinfo note. In particular, if f09bddbd8661 ("vmcoreinfo: add kallsyms_num_syms symbol") is present, then this should work.

The API I used here is to make the kallsyms finder represented as a Python object, which can be registered via add_symbol_finder(). I didn't want to hook into any of the add_debug_info() logic because I wanted maximum flexibility - most people won't want kallsyms, at least not initially. It also has the benefit of avoiding breaking any existing logic.

This can be used on Oracle Linux 7-9 with UEK 5-7, but it can also be used on the vmtest kernels, which serves as a good way to explore:

$ python3 -m vmtest.vm -k 6.4*  # any kernel 6.0 or later will do
[... boot output...]
# umount /lib/modules/$(uname -r)
# python -m drgn
>>> finder = make_kallsyms_vmlinux_finder(prog)
>>> finder("slab_caches", None, True)
[Symbol(name='slab_caches', address=0xffffffffa58e20a0, size=0x20, binding=<SymbolBinding.GLOBAL: 2>, kind=<SymbolKind.OBJECT: 1>)]
>>> prog.add_symbol_finder(finder)
>>> prog.symbol("slab_caches")
Symbol(name='slab_caches', address=0xffffffffa58e20a0, size=0x20, binding=<SymbolBinding.GLOBAL: 2>, kind=<SymbolKind.OBJECT: 1>)

No automatic testing just yet (waiting on Symbol Finder API to be stabilized and merged). However, it will be interesting to test, since ideally we would want to test the text-based and vmcore-based parsing methods. I may want to add a toggle to allow bypassing /proc/kcore so that we can test the other method.


Some notes on fixes / To-dos for this branch:

  • [ ] Obviously wait for #241 to be merged
  • [ ] If #347 is merged, I need to be careful to detect a /proc/kallsyms where all the addresses are zero, and bail out of that code path, since non-root users can still read it without memory addresses.
  • [ ] In hindsight, I think the KallsymsFinder() constructor is bad. I wanted to move additional parsing of the vmcoreinfo note out into the Python code. Now, I think libdrgn/kallsyms.c should be able to find the necessary information from the vmcoreinfo without the Python helper code.
  • [ ] There's some compatibility issues with new "long symbol" support for Rust. The kallsyms data structure format breaks with no indication. Currently I am detecting this via the kernel major version, but I really ought to send a patch with Fixes: tag to specify a NUMBER(kallsyms_version)=2 in the vmcoreinfo, that way we could detect it without version number hacks.
  • [ ] Address lookup is reasonably efficient using bsearch(). However the name lookup is currently linear. I need to add a hash table.
  • Note: module support is not planned as part of this branch. Module kallsyms requires type & object finders for vmlinux. I have Python helper code stuffed at the end of my ctf branch which implements a module kallsyms finder. That might be worth porting to C later on.

brenns10 avatar Aug 22 '23 21:08 brenns10