bootc icon indicating copy to clipboard operation
bootc copied to clipboard

soft-reboot --apply get stuck if reboot fails to exec

Open shi2wei3 opened this issue 4 months ago • 4 comments

I'm not pretty sure if bootc-destructive-cleanup.service blocks the soft-reboot --apply happens, but both services are failed.

Last login: Wed Sep 17 16:02:43 2025 from 192.168.122.1
[systemd]
Failed Units: 2
  bootc-destructive-cleanup.service
  run-p1601-i1901.service
[root@RHEL ~]# systemctl status bootc-destructive-cleanup.service
× bootc-destructive-cleanup.service - Cleanup previous the installation after an alongside installation
     Loaded: loaded (/usr/lib/systemd/system/bootc-destructive-cleanup.service; disabled; preset: disabled)
     Active: failed (Result: exit-code) since Wed 2025-09-17 16:03:45 UTC; 2min 44s ago
 Invocation: e497b9e66334478abea7e2ec87fa72ad
       Docs: man:bootc(8)
    Process: 827 ExecStart=/usr/lib/bootc/fedora-bootc-destructive-cleanup (code=exited, status=123)
   Main PID: 827 (code=exited, status=123)
   Mem peak: 375.4M
        CPU: 1min 5.483s

Sep 17 16:03:45 RHEL fedora-bootc-destructive-cleanup[1043]: warning: /etc/subgid saved as /etc/subgid.rpmsave
Sep 17 16:03:45 RHEL fedora-bootc-destructive-cleanup[1043]: warning: /etc/shadow saved as /etc/shadow.rpmsave
Sep 17 16:03:45 RHEL fedora-bootc-destructive-cleanup[1043]: warning: /etc/passwd saved as /etc/passwd.rpmsave
Sep 17 16:03:45 RHEL fedora-bootc-destructive-cleanup[1043]: warning: /etc/hosts saved as /etc/hosts.rpmsave
Sep 17 16:03:45 RHEL fedora-bootc-destructive-cleanup[1043]: warning: /etc/gshadow saved as /etc/gshadow.rpmsave
Sep 17 16:03:45 RHEL fedora-bootc-destructive-cleanup[1043]: warning: /etc/group saved as /etc/group.rpmsave
Sep 17 16:03:45 RHEL systemd[1]: bootc-destructive-cleanup.service: Main process exited, code=exited, status=123/n/a
Sep 17 16:03:45 RHEL systemd[1]: bootc-destructive-cleanup.service: Failed with result 'exit-code'.
Sep 17 16:03:45 RHEL systemd[1]: Failed to start bootc-destructive-cleanup.service - Cleanup previous the installation after an alongside installation.
Sep 17 16:03:45 RHEL systemd[1]: bootc-destructive-cleanup.service: Consumed 1min 5.483s CPU time, 375.4M memory peak.
[root@RHEL ~]# systemctl status run-p1601-i1901.service
× run-p1601-i1901.service - [systemd-run] /usr/bin/systemctl reboot "--message=Initiated by bootc"
     Loaded: loaded (/run/systemd/transient/run-p1601-i1901.service; transient)
  Transient: yes
     Active: failed (Result: exit-code) since Wed 2025-09-17 16:03:33 UTC; 3min 19s ago
   Duration: 56ms
 Invocation: b2debc71c72e406486fe36935e1c6891
    Process: 1603 ExecStart=/usr/bin/systemctl reboot --message=Initiated by bootc (code=exited, status=1/FAILURE)
   Main PID: 1603 (code=exited, status=1/FAILURE)
   Mem peak: 1.4M
        CPU: 8ms

Sep 17 16:03:33 RHEL systemd[1]: Started run-p1601-i1901.service - [systemd-run] /usr/bin/systemctl reboot "--message=Initiated by bootc".
Sep 17 16:03:33 RHEL systemctl[1603]: Call to Reboot failed: Access denied
Sep 17 16:03:33 RHEL systemd[1]: run-p1601-i1901.service: Main process exited, code=exited, status=1/FAILURE
Sep 17 16:03:33 RHEL systemd[1]: run-p1601-i1901.service: Failed with result 'exit-code'.
[root@RHEL ~]#

bootc switch --soft-reboot=required --apply will get stuck here

# bootc switch <soft-reboot-capable-image> --soft-reboot=required --apply
layers already present: 66; layers needed: 2 (14.4 MB)
Fetched layers: 13.69 MiB in 1 second (9.42 MiB/s)
  Deploying: done (4 seconds)
Queued for next boot: quay.io/wshi/centos-bootc:arm64
  Version: 10
  Digest: sha256:931e57f8012f687acfa158034d5ceff2e2ef84fc675b5028b93bcfb43a38c16e
Staged deployment is soft-reboot capable, preparing for soft-reboot...


After a manual reboot, system reboot successfully to the target bootc image, but the failed systemd services remain there

[systemd]
Failed Units: 2
  bootc-destructive-cleanup.service
  run-p1601-i1901.service

shi2wei3 avatar Sep 17 '25 16:09 shi2wei3

Impact is extremely low. It's easy to find out why from system log and reboot manually. I didn't see a good way to redirect the systemctl reboot output to the soft-reboot command.

shi2wei3 avatar Sep 19 '25 02:09 shi2wei3

Hmm this is from a to-existing-root install and then a reboot, and then another update after that?

Sep 17 16:03:33 RHEL systemctl[1603]: Call to Reboot failed: Access denied

That's weird. Do you see anything in systemd-inhibit --list here?

cgwalters avatar Sep 19 '25 11:09 cgwalters

Not just purely to-existing-root, you need to use s-r-b which will trigger bootc-destructive-cleanup.service on the firstboot to reproduce it, this a corner case.

# systemctl reboot
Operation inhibited by "RPM" (PID 1210 "rpm", user root), reason is "Transaction running".
Please retry operation after closing inhibitors and logging out other users.
'systemd-inhibit' can be used to list active inhibitors.
Alternatively, ignore inhibitors and users with 'systemctl reboot -i'.
# systemd-inhibit --list
WHO            UID USER PID  COMM           WHAT                WHY                                       MODE
NetworkManager 0   root 988  NetworkManager sleep               NetworkManager needs to turn off networks delay
RPM            0   root 1210 rpm            shutdown:sleep:idle Transaction running                       block

PID=1210 is cleaning up the packages in the old system.

shi2wei3 avatar Sep 20 '25 07:09 shi2wei3

OK yes tricky, I think it'd be a good idea for us to check the reboot inhibitors up front if --apply is in use and error out as a general. However that still leaves a race condition. I think to close that race we'd need to default to monitoring the systemd-run invocation and propagate any errors.

cgwalters avatar Sep 20 '25 20:09 cgwalters