Support for syscall metadata such as the argument map
I am working on improving lurk -- an strace Rust rewrite. At the moment, lurk's author @JakWai01 has created a big manual map for the x86_64 syscalls, but the approach is very hard to maintain and does not support any other architectures.
The required metadata would be:
- how many arguments each syscall has
- the types of each argument:
str,int,address, ...? - syscall return type:
Error(i32),Success(i32),Address(usize), ...? - possibly some "group" that describe what each syscall belongs to:
File,IPC,Memory,Creds, ...
The groups are a bit tricky as I am not certain if Linux officially describes each syscall in terms of which group(s) it belongs to - so this might be omitted.
We can use a macro_rule! (or even a proc macro) to improve syscall_enum! (adapting from a table)
syscall_enum! {
pub enum Sysno {
/// See [read(2)](https://man7.org/linux/man-pages/man2/read.2.html) for more info on this syscall.
#[params = "(fd: i32, buf: str, count: usize) -> isize", categories = ["desc"]]
read = 0,
/// See [write(2)](https://man7.org/linux/man-pages/man2/write.2.html) for more info on this syscall.
#[params = "(fd: i32, buf: str, count: usize) -> isize", categories = ["desc"]]
write = 1,
/// See [open(2)](https://man7.org/linux/man-pages/man2/open.2.html) for more info on this syscall.
#[params = "(pathname: str, flags: i32, mode: u32) -> isize", categories = ["desc", "file"]]
open = 2,
The actual syntax for the arguments could be different to simplify macro_rules processing (proc_macros are much harder to implement and maintain).
A macro could parse these params into a few extra functions:
// sequentially store all arguments for all syscalls
static ALL_ARGS: [ArgType; 2500] = [
ArgType {name: "fd", typ: i32},
ArgType {name: "buf", typ: str},
ArgType {name: "count", typ: usize},
ArgType {name: "fd", typ: i32},
ArgType {name: "buf", typ: str},
ArgType {name: "count", typ: usize},
ArgType {name: "pathname", typ: str},
ArgType {name: "flags", typ: i32},
ArgType {name: "mode", typ: u32},
...
];
lazy_static! {
static ref ARG_MAP: [&[ArgType]; 450] = [
ALL_ARGS[0..3], // read
ALL_ARGS[3..6], // write
ALL_ARGS[6..9], // open
...
];
}
/// Auto-generated function
pub fn get_arguments(syscall: usize) -> &[ArgType] {
ARG_MAP[syscall]
}
We would also generate lists for categories and return types - TBD of the exact format.
I saw lurk posted on /r/rust a while back. Looks exciting! I've actually got quite a lot of experience with what you're doing. ;)
Regarding syscall "groups": See https://github.com/jasonwhite/syscalls/issues/20. Defining sets of syscalls should be relatively easy. I would just copy what strace has done (see links in that issue). To keep the core of this library small, I'd also put it behind a feature flag that is off by default.
I'm also the author/maintainer of Reverie, where I also implemented a toy version of strace that is able to pretty-print the arguments. The mapping of syscall numbers to argument types is defined here. However, it's a crap-ton of code and I wouldn't do it that way again because, like you said, it is architecture-specific and it is hard to maintain. It only supports x86-64 and aarch64 right now. I can probably get reverie-syscalls published on crates.io if you wish.
Instead, because Linux is the source of truth and contains all the information you need, what I would do is this:
- Compile Linux for your target architecture with debug symbols on. The result is a vmlinux ELF file. Some distos include debug symbols in their compiled Linux at
/boot/vmlinux-*, but this only covers 1 architecture. - Using
goblinor a similar ELF parsing library, find the syscall table symbol (sys_call_table). This is an array of syscall functions. Whenever Linux traps the syscall instruction, it calls the function with something likesys_call_table[sysno](args...). - Conveniently, Linux also stores metadata on all syscalls (because BPF needs it). This metadata includes all of the argument names and types of every syscall.
- Now, just because we know the C types for all syscall arguments, it doesn't mean we can do much with it in the Rust world. For this, we can define a simple mapping to get the Rust version of the argument type. For example,
unsigned char*becomes*mut u8,const int*becomes*const i32,struct statx*becomes*mut libc::statx, etc. This can convert the majority of argument types, but there are some cases wherelibcdoesn't have the equivalent type. For those, you can just map it to*mut libc::c_void. However, if you really want the full definition of the struct, it can be found in the debug info as well. You'd just need to search transitively through the struct/union definitions. - Using your newfound powers, you can generate the equivalent Rust code in whatever format you want, deriving the pretty-printing of the argument types. There are some very tricky corner cases, like figuring out what each of the possible request/response types are for
ioctl. If you parse deeply enough, I think this can be derived as well.
This script does steps 1-3 and was used to generate the syscall table at https://jasonwhite.io/thing/syscalls/. Since this will likely generate a lot of code, I probably wouldn't put it in the syscalls library. It should probably go into its own crate.
FYI, I also wrote safeptrace. It helps to avoid shooting yourself in the foot with the ptrace API and also provides a very efficient async ptrace API. I've been meaning to get it published as a crate as well.
Thanks for the indepth info! For the past few days i was massively rewriting Lurk (hope it gets merged :) ) - and you can see the more "syscalls-based" approach here.
There is still a lot to do, and working with the well understood and small-scopped lib like safeptrace would be awesome! Please publish :)
At this stage, I am not looking to build a full decoder for each syscall, but some basics like (signed/unsigned) int, a bool, string, and address would be an awesome start. Eventually it might be fun to have auto-generated structs, but ... baby steps. So the steps above look reasonable, and i will see how it can be better implemented as an opt-in feature of your crate.
For syscall categories/groups, I don't think it should go into the syscall_enum! macro because then I think you'd have to modify syscall-gen to spit out the categories. I think the following would be a reasonable approach.
In some non-arch specific place:
// All the possible categories. Could use `EnumSetType` from `enumset` here.
pub enum Categories {
File,
Descriptor,
IPC,
Memory,
Creds,
// ...
}
Since src/arch/x86_64.rs is generated, the categories could be put into src/arch/x86_64/categories.rs:
use crate::Categories;
use crate::Categories::*;
static CATEGORIES: SysnoMap<Categories> = SysnoMap::new(&[
(Sysno::read, Descriptor),
(Sysno::open, Descriptor | File),
]);
impl Sysno {
pub fn categories(&self) -> Categories {
*CATEGORIES[self]
}
}
Unfortunately, having a separate CATEGORIES definition for every architecture is a bit repetitive as the syscall categories will be the same for every architecture. I have some thoughts on how to avoid this duplication, but I'll put that into another github issue.
Edit: Added those thoughts in #30.
I keep thinking if it would make more sense to just add the args and their types (as en enum) as part of the generated syscalls enum - keeping two crates like this in sync may be a bit of a pain, plus i think compiling-time is not that different.
The actual parsing of the args based on that enum could be a separate crate outright, maybe part of the same repo. E.g. there could be a separate crate that focuses on registry parsing - reading strings, signed/unsigned ints, and all sorts of other "magical" structs.
P.S. I am a bit unclear why I would need elf parsing - could we do similar (hacky) text parsing of the files to generate the arguments? And also, how the syscalls-gen crate relates to gen-syscall repo...
The number of different argument types for all the syscalls is quite large and deep, especially if you want to follow struct pointers or differentiate flags. For example, strace will read the stat struct pointer to pretty-print the information inside. I'm assuming you'll eventually want to do that. I don't want to pollute this library with those types (or maintain it). This library is meant to be low-level, providing only the basic necessities for dealing with syscalls.
Note that reverie-syscalls implements a typed interface for most syscalls and a way to display them. It is incomplete, but more than you'll likely ever need for an strace clone. For syscalls it doesn't have the arguments for, it just defaults to the 6 register values. So, it doesn't need to be 100% complete for it to be useful. There is a long tail of rarely used syscalls that you'll likely never see in the output of strace.
I am a bit unclear why I would need elf parsing
You don't need it, but I think it's way easier, more accurate, and more maintainable than trying to extract all the arguments via grepping the Linux codebase. With ELF parsing, you'll get a complete list of syscall numbers mapping to their arguments.
Here's a JSON file that I generated a couple of years ago that contained all of the syscalls (at the time) along with their arguments using the ELF parsing method. The hardest part is just building a kernel with debug symbols. Generating that list for other architectures is just a simple matter of building the Linux kernel for your target architecture. I had plans to automate this one day using GitHub Actions, but never got around to it.
Looks like linux-raw-sys contains all of the types that could possibly be used for syscall arguments. This combined with the method described above should yield rich type information for all syscall arguments.
I've also released a more complete syscall argument scraper that can do a hacky C-type to Rust-type translation, but maybe the type name conversion is unnecessary with linux-raw-sys.
@nyurik , let me say some other ideas.
could we do similar (hacky) text parsing of the files to generate the arguments?
You mean parsing Linux source? This is not so easy. Consider this code from https://elixir.bootlin.com/linux/v6.7.1/source/kernel/fork.c :
#ifdef CONFIG_CLONE_BACKWARDS
SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,
int __user *, parent_tidptr,
unsigned long, tls,
int __user *, child_tidptr)
#elif defined(CONFIG_CLONE_BACKWARDS2)
SYSCALL_DEFINE5(clone, unsigned long, newsp, unsigned long, clone_flags,
int __user *, parent_tidptr,
int __user *, child_tidptr,
unsigned long, tls)
#elif defined(CONFIG_CLONE_BACKWARDS3)
SYSCALL_DEFINE6(clone, unsigned long, clone_flags, unsigned long, newsp,
int, stack_size,
int __user *, parent_tidptr,
int __user *, child_tidptr,
unsigned long, tls)
#else
SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp,
int __user *, parent_tidptr,
int __user *, child_tidptr,
unsigned long, tls)
#endif
So, as you can see, we need to pass Linux source through preprocessor with proper macros. I think this is more difficult than parsing ELF.
Also, you can see prototypes in https://elixir.bootlin.com/linux/v6.7/source/include/linux/syscalls.h . But again, you can see "#ifdef"s here.
Also, there is very simple way to get syscall prototypes: just parse /sys/kernel/debug/tracing/events/syscalls/sys_enter_* in running kernel. :)
Another way: somehow extract needed info from strace sources.
Another way: use data from syzkaller, for example: https://github.com/google/syzkaller/blob/cc4a4020ecb6d62110981f597feea0c04a643efa/sys/linux/filesystem.txt
@jasonwhite , let me share my experience from using https://github.com/jasonwhite/gen-syscalls .
I want to get prototypes for all Linux syscalls. I want to do this the "right" way, the most robust and correct way. As I said in the previous message, this is not so easy.
I found this issue, so I tried https://github.com/jasonwhite/gen-syscalls . I immediately found a bug, which prevents building with modern rustc: Cargo.toml doesn't specify derive feature of serde. After fixing Cargo.toml I was able to successfully get syscall prototypes from vmlinux.
Here is note to everybody reading this issue, including @nyurik : you don't need to build Linux yourself. Just download file https://deb.debian.org/debian/pool/main/l/linux/linux-image-*-dbg_*.deb. Note the dbg part. The file will contain needed vmlinux image.
Okay, so I was able to extract syscall info from Linux 4.19.0 x86_64. Then I tried on ARM Linux from this link:
http://ftp.debian.org/debian/pool/main/l/linux/linux-image-6.1.0-16-arm64-dbg_6.1.67-1_arm64.deb
And the program crashed. So, it seems it was never tested on anything outside of x86.
So, I decided I will not use it. Because I want something absolutely "right", and program, which doesn't work for ARM doesn't feel "right" enough.
I'm also aware about https://github.com/facebookexperimental/reverie/tree/main/experimental/scrape-syscalls , but I decided not to try it. I suspect that it may be buggy, too. So I decided to use the most "right" way: I will just parse /sys/kernel/debug/tracing/events/syscalls/sys_enter_*.
Also, it seems https://github.com/facebookexperimental/reverie/tree/main/experimental/scrape-syscalls maps kernel types to types from crate libc as opposed to crate linux-raw-sys. I don't like this.
This is not insult, this is experience report
@safinaskar Thanks for the experience report. Do note that it is an experimental tool and quite hacky. I open-sourced it in case someone finds it useful and wants to build on top of the idea.
Just to provide a little more background info: The information from /sys/kernel/debug/tracing/events/syscalls/sys_enter_* is the same information that exists in the kernel's debug info, which is derived via the SYSCALL_METADATA macro.
I found it easier to get this info from a compiled kernel rather than a running kernel because, like you've found, it is easy to download a kernel image for a range of versions and architectures. I believe it also contains a superset of information, which may or may not be needed for this use-case.
If you want to gather syscall info via sysfs for a specific kernel version and architecture, then you'll just have to use a VM to run that particular kernel. This is probably possible with GitHub's CI, but I'm not sure how much control you have over the kernel version. The output of sysfs is also less likely to change in the future, since it is more observable to the user than the debug info.
Because I want something absolutely "right", and program, which doesn't work for ARM doesn't feel "right" enough.
The "right" way is whatever works well.
Note that ARM support shouldn't be too hard to add. There are probably only a few architecture-specific things (e.g., the __x64_ symbol prefix).
Also, it seems https://github.com/facebookexperimental/reverie/tree/main/experimental/scrape-syscalls maps kernel types to types from crate libc as opposed to crate linux-raw-sys. I don't like this.
If you don't like it, then change it. :)
I just found brilliant project for extracting syscall info: https://github.com/mebeim/systrack
Great find! Looks like a much more polished version of what I implemented.