blazesym Issue symbolizing Android stack addresses

Hi everyone! I'm having trouble symbolizing stack addresses from a process in Android 14.

I'm trying to symbolize raw stack addresses obtained by a BPF program attached to "perf_event" attachment type, but both user and kernel space are affected by the same problem.

In the first place, I tested my program on a Linux system with kernel version 6.6.31 and address symbolization was working smoothly.

Then, I'm using the same program (using capi) in a ch-rooted Debian environment mounted inside /data to test it in an Android environment.

Details

Kernel version: 6.1.23-android14-4-00257-g7e35917775b8-ab9964412 Architecture: x86_64 Kernel config:

CONFIG_BPF=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_ARCH_WANT_DEFAULT_BPF_JIT=y
# BPF subsystem
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_JIT=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_BPF_JIT_DEFAULT_ON=y
# CONFIG_BPF_UNPRIV_DEFAULT_OFF is not set
# CONFIG_BPF_PRELOAD is not set
# CONFIG_BPF_LSM is not set
# end of BPF subsystem
CONFIG_CGROUP_BPF=y
CONFIG_NETFILTER_XT_MATCH_BPF=y
# CONFIG_BPFILTER is not set
CONFIG_NET_CLS_BPF=y
CONFIG_NET_ACT_BPF=y
# CONFIG_BPF_STREAM_PARSER is not set
CONFIG_BPF_LIRC_MODE2=y
CONFIG_FUSE_BPF=y
CONFIG_BPF_EVENTS=y
# CONFIG_TEST_BPF is not set

Backtrace

Unfortunately, I couldn't recover the stack trace in the normal way after setting RUST_BACKTRACE=1 so I tried getting the stack trace using gdb.

[#0] 0x485899 → core::slice::raw::from_raw_parts::precondition_check(data=0x7ffff741033c, size=0x8, align=0x8, len=0x5)
[#1] 0x70cefd → core::slice::raw::from_raw_parts<u64>(data=0x7ffff741033c, len=0x5)
[#2] 0x69d341 → blazesym_c::slice_from_user_array<u64>(items=0x7ffff741033c, num_items=0x5)
[#3] 0x43ac1f → blazesym_c::symbolize::blaze_symbolize_impl(symbolizer=0x7ffff0001630, src=blazesym::symbolize::source::Source::Process(blazesym::symbolize::source::Process {
    pid: blazesym::pid::Pid::Pid(core::num::nonzero::NonZero<u32> (
        0x341f
      )),
    debug_syms: 0x0,
    perf_map: 0x0,
    map_files: 0x0,
    _non_exhaustive: ()
  }), inputs=blazesym::symbolize::Input<*const u64>::AbsAddr(0x7ffff741033c), input_cnt=0x5)
[#4] 0x43b3be → blazesym_c::symbolize::blaze_symbolize_process_abs_addrs(symbolizer=0x7ffff0001630, src=0x7ffff7c17c10, abs_addrs=0x7ffff741033c, abs_addr_cnt=0x5)
[#5] 0x409be4 → show_stack_trace(stack=0x7ffff741033c, stack_sz=0x5, pid=0x0)
[#6] 0x409be4 → handle_event(ctx=<optimized out>, cpu=<optimized out>, stack_data=0x7ffff741000c, stack_size=<optimized out>)
[#7] 0x411a11 → perf_buffer__process_record(e=<optimized out>, ctx=<optimized out>)
[#8] 0x411b62 → perf_event_read_simple(mmap_mem=0x7ffff740f000, mmap_size=0x8000, page_size=<optimized out>, copy_mem=0x7ffff0001860, copy_size=0x7ffff0001868, private_data=0x7ffff0001850, fn=0x4119c0 <perf_buffer__process_record>)
[#9] 0x422403 → perf_buffer__process_records(pb=<optimized out>, pb=0x7ffff00017b0, cpu_buf=<optimized out>)

stdout:

...
thread '<unnamed>' panicked at library/core/src/panicking.rs:156:5:
unsafe precondition(s) violated: slice::from_raw_parts requires the pointer to be aligned and non-null, and the total size of the slice not to exceed `isize::MAX`
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread caused non-unwinding panic. aborting.
Aborted

Initially, I thought the problem was related to the attaching type "perf_event" ("perf_event" is not listed here neither in loader.cpp src file), but after changing attachment type I had the same problem so I'm excluding it.

I'm wondering if there are any differences I should be aware of when symbolizing addresses in Android. Maybe I haven't provided all the necessary information to be able to reproduce the problem so I'm ready to provide further details if needed.

Thank you so much in advance everybody will take care of this issue!

May 17 '24 16:05 faccimatteo

Thanks for the report! I doubt it is a problem specific to Android. It appears as if the slice's address isn't 8 byte aligned as it should. That sounds like a bug in blazesym. Will check to see if we can identify the allocation.

May 17 '24 18:05 d-e-s-o

Sorry, I read the backtrace wrong. The issue is on your end.

[#5] 0x409be4 → show_stack_trace(stack=0x7ffff741033c, stack_sz=0x5, pid=0x0)

The address of the stack array is not correctly aligned. It's an array of u64. C++, C, and Rust require natural alignment, meaning that values need to reside on an 8 byte boundary. 0x7ffff741033c is not evenly divisible by eight.

Edit: So try using calloc instead of malloc or make sure to align the array properly somehow.

May 17 '24 20:05 d-e-s-o

Thanks for your help, indeed you're right.

To give you a brief overview of how I am profiling stack traces, this is the structure of the current BPF program I'm using (by now I'm using perf buffer but the plan is to switch to ring buffer very soon):

struct {
    __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
    __uint(max_entries, 2);
    __uint(key_size, sizeof(int));
    __uint(value_size, sizeof(__u32));
} perfmap SEC(".maps");

struct {
    __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
    __uint(max_entries, 1);
    __type(key, __u32);
    __type(value, struct stack_trace_t);
} stackdata_map SEC(".maps");

SEC("perf_event")
int get_stacktrace(void *ctx)
{
    ...

    data = bpf_map_lookup_elem(&stackdata_map, &key);
    if (!data)
      return 0;
   
    ...
   
    // MAX_STACK_RAWTP is our max stack size we can retrieve 
    max_len = MAX_STACK_RAWTP * sizeof(__u64);
    max_buildid_len = MAX_STACK_RAWTP * sizeof(struct bpf_stack_build_id);
    data->pid = pid;
    data->kern_stack_size = bpf_get_stack(
				ctx, 
				data->kern_stack,
                                max_len, 
				0);
    if (data->kern_stack_size < 0) {
      bpf_printk("bpf_get_stack: failed to get kernel stack");
    }
    data->user_stack_size = bpf_get_stack(
				ctx, 
				data->user_stack,
				max_len,
                                BPF_F_USER_STACK);
    if (data->user_stack_size < 0) {
      bpf_printk("bpf_get_stack: failed to get user stack");
    }
    
    bpf_perf_event_output(ctx, &perfmap, 0, data, sizeof(*data));
 
    return 0;
}

and here it is, a corrispettive userspace where I instantiate the perf event sampler

int main(int argc, char **argv) 
{
        struct perf_event_attr attr;
	int *pefds = NULL, pefd;
	
	...
	
        memset(&attr, 0, sizeof(attr));
	//attr.type = PERF_TYPE_HARDWARE;
	attr.type = PERF_TYPE_SOFTWARE;
	attr.size = sizeof(attr);
	// attr.config = PERF_COUNT_HW_CPU_CYCLES;
	attr.config = PERF_COUNT_SW_CPU_CLOCK;
	attr.sample_freq = 10000;
	attr.freq = 1;
	/* Needed to a VM */
	attr.sample_type = PERF_SAMPLE_TID | PERF_SAMPLE_CALLCHAIN | PERF_SAMPLE_STACK_USER;
	
	...

From what I know, bpf_get_stack() should be responsible for sending the right stack address to my user space, when I can then retrieve it through the loaded perfmap. In fact, by aligning the stack by subtracting a value to get the closest address, everything seems to work fine.

I could edit my code by making the stack address aligned, but shouldn't bpf_get_stack() do it for me already? I'm wondering if it ever happened something similar, it seems like it's always adding a fixed offset..

Thanks again in advance.

May 22 '24 16:05 faccimatteo

Doesn't the kernel just memcpy into the buffer you provide? How is stack_trace_t defined?

May 23 '24 16:05 danielocfb

stack_trace_t is defined as follows:

#define MAX_STACK_RAWTP 100

struct stack_trace_t {
  	int pid;
  	int kern_stack_size;
  	int user_stack_size;
  	__u64 kern_stack[MAX_STACK_RAWTP];
  	__u64 user_stack[MAX_STACK_RAWTP];
};

in an external header.

However, I believe the perf event has been misconfigured as PERF_COUNT_SW_CPU_CLOCK configuration sample per-CPU clock events.

Indeed, analyzing blazesym symbolizing I have:

ffffff8000000000: __per_cpu_end @ 0x2c468+0xffffff7ffffd3b98
adaa0e59ffffffff: __per_cpu_end @ 0x2c468+0xadaa0e59fffd3b97
ada9d74effffffff: __per_cpu_end @ 0x2c468+0xada9d74efffd3b97
adc0009bffffffff: __per_cpu_end @ 0x2c468+0xadc0009bfffd3b97

Switching the perf event configuration to PERF_COUNT_SW_BPF_OUTPUT doesn’t seem to resolve the problem either, as I cannot receive any output from perfmap. When the process runs, I get the following event from the android kernel:

[ 1191.687512] type=1400 audit(1716835216.702:41): avc: denied { bpf } for comm="eptracer" capability=39 scontext=u:r:su:s0 tcontext=u:r:su:s0 tclass=capability2 permissive=1

SELinux is already in permissive mode so I’m trying to figure out what could be the cause of these events not being sent to userspace

May 27 '24 19:05 faccimatteo

It's entirely possible that it's a kernel bug, but I don't know much about Android. You can try using a ringbuf instead of a perfbuf and see if that eliminates the issue. Other than that, figure out who does the allocation and then reach out to them. Alternatively, if you just want to work around it, copy the stack trace into a properly aligned buffer that you allocated, and then pass that to blazesym.

May 29 '24 22:05 danielocfb

So checking https://en.wikipedia.org/w/index.php?title=Data_structure_alignment&oldid=1209741102#Typical_alignment_of_C_structs_on_x86 it appears as if 4 byte alignment for 8 byte words may be actually valid for C, meaning that they don't really require natural alignment. In that case...I am open to adjusting blazesym-c to relax alignment requirements, as it's specifically designed for C interop.

May 30 '24 17:05 danielocfb

Can you retry with https://github.com/libbpf/blazesym/commit/247c4c512ac784a99fbffde6ce07e85db9e15991 ? I'd think that will fix the issue.

May 30 '24 20:05 danielocfb

Hi Daniel, I can confirm you that in 247c4c5 now it works as expected 🎉

Indeed, tracing Dialer app, one of the backtraces I can get is the following:

Kernel:
ffffffffffffff80: __this_module @ 0xffffffffc064a1c0+0x3f9b5dc0
ffffffffbca364dd: put_cpu_partial @ 0xffffffffbca36440+0x9d
ffffffffbca35d1e: __slab_free @ 0xffffffffbca35b60+0x1be
ffffffffbca3240c: kmem_cache_free @ 0xffffffffbca32020+0x3ec
ffffffffbca6cbfd: file_free_rcu @ 0xffffffffbca6cbd0+0x2d
ffffffffbc85ee9b: rcu_do_batch @ 0xffffffffbc85ec80+0x21b
ffffffffbc85e985: rcu_core @ 0xffffffffbc85e7e0+0x1a5
ffffffffbc857169: rcu_core_si @ 0xffffffffbc857160+0x9
ffffffffbda00140: __do_softirq @ 0xffffffffbda00010+0x130
ffffffffbc7b9440: __irq_exit_rcu @ 0xffffffffbc7b93f0+0x50
ffffffffbc7b93e9: irq_exit_rcu @ 0xffffffffbc7b93e0+0x9
ffffffffbd6a069f: sysvec_apic_timer_interrupt @ 0xffffffffbd6a0600+0x9f
ffffffffbd800d0b: asm_sysvec_apic_timer_interrupt @ 0xffffffffbd800cf0+0x1b
ffffffffc0344d71: goldfish_pipe_read_write @ 0xffffffffc0344b60+0x211
ffffffffc03441fb: goldfish_pipe_read @ 0xffffffffc03441f0+0xb
ffffffffbca68005: vfs_read @ 0xffffffffbca67f20+0xe5
ffffffffbca68bfc: ksys_read @ 0xffffffffbca68b90+0x6c
ffffffffbca68c96: __x64_sys_read @ 0xffffffffbca68c80+0x16
ffffffffbd69d77f: do_syscall_64 @ 0xffffffffbd69d730+0x4f
ffffffffbd80009b: entry_SYSCALL_64_after_hwframe @ 0xffffffffbd800038+0x63

Userspace:
fffffffffffffe00: <no-symbol>
0000782dd83631f7: read @ 0xb81f0+0x7

Now I need to investigate on why I do get fffffffffffffe00 as address in user stack

Jun 04 '24 22:06 faccimatteo