tracee icon indicating copy to clipboard operation
tracee copied to clipboard

events: add bpf_attach event

Open roikol opened this issue 3 years ago • 4 comments

this event indicates an ebpf program being attached to a perf event. this event includes arguments describing the bpf program, and arguments describing the perf event.

an example event: 14:18:52:018729 0 bpftrace 30743 30743 0 bpf_attach prog_type: BPF_PROG_TYPE_KPROBE, prog_name: do_nanosleep, perf_symbol: do_nanosleep, perf_addr: 0, prog_write_user: false, perf_type: kretprobe

Initial Checklist

  • [x] There is an issue describing the need for this PR.
  • [x] Git log contains summary of the change.
  • [x] Git log contains motivation and context of the change.
  • [ ] If part of an EPIC, PR git log contains EPIC number.
  • [ ] If part of an EPIC, PR was added to EPIC description.

Description (git log)

this event indicates an ebpf program being attached to a perf event. this event includes arguments describing the bpf program, and arguments describing the perf event.

Fixes: #2003

more details about this PR:

bpf program can be attached to a perf event in the system in two methods. in both this methods, the attach is between an already loaded bpf program to the already created perf event.

  • calling ioctl syscall with command PERF_EVENT_IOC_SET_BPF.
  • calling bpf syscall with command BPF_LINK_CREATE. (only in newer kernel versions).

so for each method we extract the structs representing the bpf program and the perf event, and pass them on to the new send_bpf_attach function, which extracts all the arguments and outputs the event to userspace.

one of the arguments in this event is the prog_write_user. this argument indicates whether or not the bpf program uses the bpf_probe_write_user helper (to be used in signatures). to fetch this information, we probe into the verfier function check_helper_call/check_map_func_compatibility (these functions are not always available - hence using both of them) which are called for each bpf helper called from a bpf program.

ideally, we would save this imformation in a map ({host_tid, prog_id} -> true/false), and then use it in a later phase in send_bpf_attach.

the problem is that in the verifier phase, the bpf program does not yet have an allocated prog_id. to overcome this issue, we save the information in a temporary map (host_tid -> true/false), and we also probe into security_bpf_prog. in security_bpf_prog the prog_id is already allocated, so we read from the temporary map, and save the information in the stable map ({host_tid, prog_id} -> true/false).

this approach works because check_helper_call/check_map_func_compatibility and security_bpf_prog are called in the context of the same syscall (bpf with command BPF_PROG_LOAD), so the temporary map with key of host_tid is enough. to access this value in send_bpf_attach, we must add the prog_id because it happens in the context of another syscall (mentioned above).

Type of change

  • [ ] Bug fix (non-breaking change fixing an issue, preferable).
  • [ ] Quick fix (minor non-breaking change requiring no issue, use with care)
  • [ ] Code refactor (code improvement and/or code removal)
  • [x] New feature (non-breaking change adding functionality).
  • [ ] Breaking change (cause existing functionality not to work as expected).

How Has This Been Tested?

locally (with different kernel versions).

Reproduce the test by running:

  • command 01: ./dist/tracee-ebpf -t e=bpf_attach -o option:parse-arguments
  • command 02: sudo bpftrace -e 'kretprobe:do_nanosleep { printf("PID %d sleeping...\n", pid); }'

Final Checklist:

Pick "Bug Fix" or "Feature", delete the other and mark appropriate checks.

  • [x] I have made corresponding changes to the documentation.
  • [x] My code follows the style guidelines (C and Go) of this project.
  • [x] I have performed a self-review of my own code.
  • [x] I have commented all functions/methods created explaining what they do.
  • [x] I have commented my code, particularly in hard-to-understand areas.
  • [x] My changes generate no new warnings.
  • [ ] I have added tests that prove my fix, or feature, is effective.
  • [ ] New and existing unit tests pass locally with my changes.
  • [ ] Any dependent changes have been merged and published before.

Git Log Checklist:

My commits logs have:

  • [x] Subject starts with "subsystem|file: description".
  • [x] Do not end the subject line with a period.
  • [x] Limit the subject line to 50 characters.
  • [x] Separate subject from body with a blank line.
  • [x] Use the imperative mood in the subject line.
  • [x] Wrap the body at 72 characters.
  • [x] Use the body to explain what and why instead of how.

roikol avatar Aug 21 '22 11:08 roikol

this PR bumps the version of libbpfgo to make use of https://github.com/aquasecurity/libbpfgo/pull/209.

currently failing until #2060 will be solved (change to libbpfgo was already merged).

roikol avatar Aug 21 '22 11:08 roikol

explaining the usage of both probes: check_helper_call and check_map_func_compatibility.

in this PR i use these two functions for the same purpose - saving for later usage whether or not the bpf program uses the bpf_probe_write_user helper.

ideally, only one of these functions would be used. but in the tests i condacted to see if the event is ok (outputs when needed and values are correct), i encountered two problems:

  • check_helper_call is being optimized in some kernel builds and its symbol appears as check_helper_call.isra.X. this means that we could not attach to check_helper_call. the solution might have been to regex the available sybmols with check_helper_call.* for example, to get the correct full symbol. the problem is that the compiler optimization also changes the actuall arguments passed to this function. thus, a bpf program that attaches to this event could not accuratly read the arguments of the function (this is probably the reason the symbol is changed).
  • check_map_func_compatibility in some kernel builds is not available for tracing. from what i see, it is always exists in /proc/kallsyms, but in some cases it is not available in /sys/kernel/tracing/available_filter_functions.

because of these two problems i attempt to probe into both of the functions, hoping at least one of them will work (which will be the case on most of the systems). if both of the probes are not working - the value of the argument is "unknown"

roikol avatar Aug 21 '22 14:08 roikol

Thanks for the detailed description, makes my life much easier to understand amount of probes and correlation in between temporary and permanent/stable map values. Reviewing this now...

rafaeldtinoco avatar Sep 14 '22 17:09 rafaeldtinoco

It looks good to me (without testing, just reading code and functions that inspired the logic).

Would you mind rebasing, addressing some minor comments (mostly related to style and formatting), so we can go ahead, look for a last time and merge it ?

I'm also proposing the following minor format/comment changes for better readability (code looked dense at a first look):

diff --git a/pkg/ebpf/c/tracee.bpf.c b/pkg/ebpf/c/tracee.bpf.c
index 1fc87b1f..a3b44a18 100644
--- a/pkg/ebpf/c/tracee.bpf.c
+++ b/pkg/ebpf/c/tracee.bpf.c
@@ -3830,10 +3830,8 @@ static __always_inline void *get_trace_probe_from_trace_event_call(struct trace_
     return tracep_ptr;
 }
 
-/*
-this function is inspired by bpf_get_perf_event_info() func.
-https://elixir.bootlin.com/linux/v5.19.2/source/kernel/trace/bpf_trace.c#L2123
-*/
+// Inspired by v5.19: bpf_get_perf_event_info().
+
 static __always_inline int
 send_bpf_attach(event_data_t *data, struct file *bpf_prog_file, struct file *perf_event_file)
 {
@@ -3841,11 +3839,11 @@ send_bpf_attach(event_data_t *data, struct file *bpf_prog_file, struct file *per
         return 0;
     }
 
-    // get perf event details
-
-#define MAX_PERF_EVENT_NAME                                                                        \
-    ((MAX_PATH_PREF_SIZE > MAX_KSYM_NAME_SIZE) ? MAX_PATH_PREF_SIZE : MAX_KSYM_NAME_SIZE)
-#define REQUIRED_SYSTEM_LERNGTH 9
+    // clang-format off
+    #define MAX_PERF_EVENT_NAME ((MAX_PATH_PREF_SIZE > MAX_KSYM_NAME_SIZE) ? \
+            MAX_PATH_PREF_SIZE : MAX_KSYM_NAME_SIZE)
+    #define REQUIRED_SYSTEM_LERNGTH 9
+    // clang-format off
 
     struct perf_event *event = (struct perf_event *) READ_KERN(perf_event_file->private_data);
     struct trace_event_call *tp_event = READ_KERN(event->tp_event);
@@ -3859,82 +3857,76 @@ send_bpf_attach(event_data_t *data, struct file *bpf_prog_file, struct file *per
 
     // get is_syscall_tracepoint
     bool is_syscall_tracepoint = false;
+
     struct trace_event_class *tp_class = READ_KERN(tp_event->class);
+
     char class_system[REQUIRED_SYSTEM_LERNGTH];
     bpf_probe_read_str(&class_system, REQUIRED_SYSTEM_LERNGTH, READ_KERN(tp_class->system));
     class_system[REQUIRED_SYSTEM_LERNGTH - 1] = '\0';
-    if (has_prefix("syscalls", class_system, REQUIRED_SYSTEM_LERNGTH)) {
+
+    if (has_prefix("syscalls", class_system, REQUIRED_SYSTEM_LERNGTH))
         is_syscall_tracepoint = true;
-    }
 
     int perf_type;
-    if (is_tracepoint) {
-        // this perf event is tracepoint
+    if (is_tracepoint) { // event is tracepoint
 
         perf_type = PERF_TRACEPOINT;
-
         struct tracepoint *tp = READ_KERN(tp_event->tp);
         bpf_probe_read_str(&event_name, MAX_KSYM_NAME_SIZE, READ_KERN(tp->name));
-    } else if (is_syscall_tracepoint) {
-        // this perf event is tracepoint
 
-        perf_type = PERF_TRACEPOINT;
+    } else if (is_syscall_tracepoint) { // event is tracepoint
 
+        perf_type = PERF_TRACEPOINT;
         bpf_probe_read_str(&event_name, MAX_KSYM_NAME_SIZE, READ_KERN(tp_event->name));
-    } else {
+
+    } else { // event is kprobe or kretprobe
+
         bool is_ret_probe = false;
         void *tracep_ptr = get_trace_probe_from_trace_event_call(tp_event);
 
-        if (flags & TRACE_EVENT_FL_KPROBE) {
-            // this perf event is kprobe
+        if (flags & TRACE_EVENT_FL_KPROBE) { // perf event is kprobe
 
             struct trace_kprobe *tracekp = get_trace_kprobe_from_trace_probe(tracep_ptr);
 
-            // determine if ret probe
+            // check if probe is a kretprobe
             struct kretprobe *krp = &tracekp->rp;
             kretprobe_handler_t handler_f = READ_KERN(krp->handler);
-            if (handler_f != NULL) {
+            if (handler_f != NULL)
                 is_ret_probe = true;
-            }
 
-            // get correct perf type
-            if (is_ret_probe) {
+            if (is_ret_probe)
                 perf_type = PERF_KRETPROBE;
-            } else {
+            else
                 perf_type = PERF_KPROBE;
-            }
 
             // get symbol name
             bpf_probe_read_str(&event_name, MAX_KSYM_NAME_SIZE, READ_KERN(tracekp->symbol));
 
             // get symbol address
-            if (!event_name[0]) {
+            if (!event_name[0])
                 probe_addr = (unsigned long) READ_KERN(krp->kp.addr);
-            }
-        } else if (flags & TRACE_EVENT_FL_UPROBE) {
-            // this perf event is uprobe
+
+        } else if (flags & TRACE_EVENT_FL_UPROBE) { // perf event is uprobe
 
             struct trace_uprobe *traceup = get_trace_uprobe_from_trace_probe(tracep_ptr);
 
             // determine if ret probe
             struct uprobe_consumer *upc = &traceup->consumer;
             void *handler_f = READ_KERN(upc->ret_handler);
-            if (handler_f != NULL) {
+            if (handler_f != NULL)
                 is_ret_probe = true;
-            }
 
-            // get correct perf type
-            if (is_ret_probe) {
+            if (is_ret_probe)
                 perf_type = PERF_URETPROBE;
-            } else {
+            else
                 perf_type = PERF_UPROBE;
-            }
 
             // get binary path
             bpf_probe_read_str(&event_name, MAX_KSYM_NAME_SIZE, READ_KERN(traceup->filename));
 
             // get symbol offset
             probe_addr = READ_KERN(traceup->offset);
+
         } else {
             // unsupported perf type
             return 0;
@@ -3956,9 +3948,8 @@ send_bpf_attach(event_data_t *data, struct file *bpf_prog_file, struct file *per
     key.prog_id = prog_id;
 
     bpf_attach_t *val = bpf_map_lookup_elem(&bpf_attach_map, &key);
-    if (val == NULL) {
+    if (val == NULL)
         return 0;
-    }
 
     // submit the event
 
@@ -5820,20 +5811,17 @@ int BPF_KPROBE(trace_security_bpf_prog)
     u32 prog_id = READ_KERN(prog_aux->id);
 
     bpf_attach_t val = {};
-    /*
-    in some systems, the 'check_map_func_compatibility' and 'check_helper_call' symbols are not
-    available. in these cases, the temporary map 'bpf_attach_tmp_map' will not hold any information
-    (WRITE_USER_FALSE/WRITE_USER_TRUE). nevertheless, we always want to output the 'bpf_attach'
-    event to the user, so using the WRITE_USER_UNKNOWN value instead of returning, just so the map
-    would be filled.
-    */
+    // In some systems, the 'check_map_func_compatibility' and 'check_helper_call' symbols are not
+    // available. For these cases, the temporary map 'bpf_attach_tmp_map' will not hold any
+    // information (WRITE_USER_FALSE/WRITE_USER_TRUE). nevertheless, we always want to output the
+    // 'bpf_attach' event to the user, so using the WRITE_USER_UNKNOWN value instead of returning,
+    // just so the map would be filled.
     val.write_user = WRITE_USER_UNKNOWN;
 
-    bpf_attach_t *existing_val =
-        bpf_map_lookup_elem(&bpf_attach_tmp_map, &data.context.task.host_tid);
-    if (existing_val != NULL) {
+    bpf_attach_t *existing_val;
+    existing_val = bpf_map_lookup_elem(&bpf_attach_tmp_map, &data.context.task.host_tid);
+    if (existing_val != NULL)
         val.write_user = existing_val->write_user;
-    }

     bpf_attach_key_t key = {0};
     key.host_tid = data.context.task.host_tid;
@@ -5846,24 +5834,22 @@ int BPF_KPROBE(trace_security_bpf_prog)
     return 0;
 }

-/*
-this function saves in the temporary map 'bpf_attach_tmp_map' whether or not bpf_probe_write_user is
-used in the bpf program. we get this information in the verifier phase of the bpf program load
-lifecycle, before a prog_id is set for the bpf program. so we save this information in a temporary
-map which includes the prog_aux pointer in the key instead of the prog_id.
+// Save in the temporary map 'bpf_attach_tmp_map' whether or not bpf_probe_write_user is used in the
+// bpf program. Get this information in the verifier phase of the bpf program load lifecycle, before
+// a prog_id is set for the bpf program. Save this information in a temporary map which includes the
+// prog_aux pointer in the key instead of the prog_id.
+//
+// Later on, in security_bpf_prog, save this information in the stable map 'bpf_attach_map', which
+// contains the prog_id in its key.

-later on, in security_bpf_prog, we save this information in the stable map 'bpf_attach_map', which
-contains the prog_id in its key.
-*/
 static __always_inline int handle_bpf_helper_func_id(u32 host_tid, int func_id)
 {
     bpf_attach_t val = {};
     val.write_user = WRITE_USER_FALSE;

     bpf_attach_t *existing_val = bpf_map_lookup_elem(&bpf_attach_tmp_map, &host_tid);
-    if (existing_val == NULL) {
+    if (existing_val == NULL)
         bpf_map_update_elem(&bpf_attach_tmp_map, &host_tid, &val, BPF_ANY);
-    }

     if (func_id == BPF_FUNC_probe_write_user) {
         val.write_user = WRITE_USER_TRUE;

rafaeldtinoco avatar Sep 14 '22 22:09 rafaeldtinoco

Thanks @roikol, it all looks good to me now. Logic seems ok and it is self contained (related to tracee inner engine), reducing potential issues. I'm merging it now (might do small refactors on code position and comments in a near future when I start splitting the eBPF source code into multiple ones, fyio).

rafaeldtinoco avatar Oct 19 '22 18:10 rafaeldtinoco

I'll squash and add appropriate event description to the git log as well.

rafaeldtinoco avatar Oct 19 '22 18:10 rafaeldtinoco

thank you @rafaeldtinoco !!

roikol avatar Oct 20 '22 14:10 roikol