systemd-bootchart icon indicating copy to clipboard operation
systemd-bootchart copied to clipboard

systemd-bootchart fails with ENOENT for "/proc/schedstat" when run from initial ramdisk

Open jamuir opened this issue 1 year ago • 3 comments

Executing systemd-bootchart from the initial ramdisk fails when systemd does its switch-root procedure.

This can be reproduced on Fedora 41 with an initial ramdisk updated to include systemd-bootchart.

The systemd-bootchart documentation does not mention if execution from the initial ramdisk is supported or not, but, internally, systemd-bootchart sets argv[0][0] = '@', so it seems like this was supported at one point (setting argv[0][0] = '@' is one way to survive the switch-root process killing spree).

The failure happens here:

https://github.com/systemd/systemd-bootchart/blob/a15bcafb60b9a24d866024953e9965316ba73eaf/src/store.c#L191C1-L194C71

I will provide an strace log and more detailed steps to reproduce below.

jamuir avatar Jan 21 '25 19:01 jamuir

strace log is attached.

strace-proc-schedstat.log

To prepare an initial ramdisk with systemd-bootchart (and strace), you can do this:

sudo -i
mkdir -p initrd/root
cd initrd/root
gunzip --stdout /boot/initramfs-6.11.4-301.fc41.aarch64.img | cpio --extract 
cd usr/lib/systemd
cp /usr/lib/systemd/systemd-bootchart .
# you can check that all required libs are already present
#   ldd /usr/lib/systemd/systemd-bootchart
cd ../..
cd bin
cp /usr/bin/strace .
# you will need to copy a few libs to support strace
#   ldd /usr/bin/strace
cd ../..
find . | cpio -o -H newc --file=../initramfs-xx.cpio
cd ..
gzip --stdout initramfs-xx.cpio > initramfs-xx.img
cp initramfs-xx.img /boot/initramfs-xx.img

Reboot and then edit the grub command to boot using the new initial ramdisk:

initrd ($root)/initramfs-xx.img

Also, add a kernel param to boot into the rd.emergency target (I also added enforcing=0):

$ xargs -n1 < /proc/cmdline 
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.11.4-301.fc41.aarch64
root=/dev/mapper/fedora_vbox-root
ro
rd.lvm.lv=fedora_vbox/root
rhgb
enforcing=0
rd.emergency

In the ramdisk emergency shell, run bootchart and then exit to continue booting:

# strace -o /run/log/strace.log /usr/lib/systemd/systemd-bootchart &
# exit

When you login as normal, systemd-bootchart won't be running.

The strace log shows that systemd-bootchart failed attempting to read /proc/schedstat and then exited:

clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=32804124}, NULL) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=1, si_uid=0} ---
restart_syscall(<... resuming interrupted clock_nanosleep ...>) = 0
lseek(4, 0, SEEK_SET)                   = 0
pread64(5, "nr_free_pages 478087\nnr_zone_ina"..., 4095, 0) = 3531
openat(AT_FDCWD, "/proc/schedstat", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
writev(2, [{iov_base="Unable to read schedstat: No suc"..., iov_len=51}, {iov_base="\n", iov_len=1}], 2) = -1 EIO (Input/output error)
getpid()                                = 241
close(3)                                = 0
close(4)                                = 0
exit_group(1)                           = ?
+++ exited with 1 +++

You can also reproduce the defect by setting the kernel param rdinit=:

rdinit=/usr/lib/systemd/systemd-bootchart

jamuir avatar Jan 22 '25 20:01 jamuir

This certainly wasn't supported.

I think we can, though. I think we might have to rewrite all the proc opening code to open the "correct" proc folder, somehow detect and fallback to the "new" location of proc and instead of opening file by full path, use openat on the existing proc directory fd. It's likely going to be a little messy because for each process, we will be opening files relative to the proc folder.

That's assuming that it actually works and the fd for /proc remains accessible after the switchroot.

sofar avatar Feb 09 '25 21:02 sofar

We have a patch that works the way you suggest; i.e. rather than use an absolute path, it holds a file descriptor to the original /proc (pre-switch-root) and then opens relative to that fd.

It seems to work.

I will test it a bit more and then open a PR.

jamuir avatar Feb 09 '25 23:02 jamuir