Failed pstore modules with core22 20220706
We are testing core22 20220706 in beta and hitting this issue frequently on both amd64 and arm devices. It doesn't happen 100% of the time, but it seems to be pretty easy to hit, and easy to reproduce by simply installing core22 from beta channel and rebooting. After rebooting, it looks like there are quite a few failed systemd units if you run this:
$ systemctl --system --no-ask-password --no-pager list-units --state=failed
Failed units:
UNIT LOAD ACTIVE SUB DESCRIPTION
● modprobe@chromeos_pstore.service loaded failed failed Load Kernel Module chromeos_pstore
● modprobe@efi_pstore.service loaded failed failed Load Kernel Module efi_pstore
● [email protected] loaded failed failed Load Kernel Module mtdpstore
● modprobe@pstore_blk.service loaded failed failed Load Kernel Module pstore_blk
● modprobe@pstore_zone.service loaded failed failed Load Kernel Module pstore_zone
● [email protected] loaded failed failed Load Kernel Module ramoops
This persists across future reboots, but if I revert back to the previous stable core22, the problem goes away.
In latest systemd update in jammy, these two lines (that come from https://github.com/systemd/systemd/commit/70e74a5997ae2ce7ba72a74ac949c3b2dad1a1d6) are added to systemd-pstore.service:
After=modprobe@efi_pstore.service [email protected] modprobe@chromeos_pstore.service [email protected] modprobe@pstore_zone.service modprobe@pstore_blk.service
Wants=modprobe@efi_pstore.service [email protected] modprobe@chromeos_pstore.service [email protected] modprobe@pstore_zone.service modprobe@pstore_blk.service
The intention is to load any possible pstore provider before trying to start systemd-pstore. Note that actually if the loading of the kernel module fails, that should be fine as - is used as a prefix in [email protected] (this makes the status successful even if modprobe returns with error):
...
[Service]
Type=oneshot
ExecStart=-/sbin/modprobe -abq %I
The interesting thing is that these services do not fail because of a failure to load the module, but because an excessive number of retries:
$ sudo journalctl -u modprobe@chromeos_pstore.service
Jul 20 09:02:37 ubuntu systemd[1]: modprobe@chromeos_pstore.service: Deactivated successfully.
Jul 20 09:02:37 ubuntu systemd[1]: Finished Load Kernel Module chromeos_pstore.
Jul 20 09:02:38 ubuntu systemd[1]: Starting Load Kernel Module chromeos_pstore...
Jul 20 09:02:38 ubuntu systemd[1]: modprobe@chromeos_pstore.service: Deactivated successfully.
Jul 20 09:02:38 ubuntu systemd[1]: Finished Load Kernel Module chromeos_pstore.
Jul 20 09:02:38 ubuntu systemd[1]: Starting Load Kernel Module chromeos_pstore...
Jul 20 09:02:39 ubuntu systemd[1]: modprobe@chromeos_pstore.service: Deactivated successfully.
Jul 20 09:02:39 ubuntu systemd[1]: Finished Load Kernel Module chromeos_pstore.
Jul 20 09:02:39 ubuntu systemd[1]: Starting Load Kernel Module chromeos_pstore...
Jul 20 09:02:39 ubuntu systemd[1]: modprobe@chromeos_pstore.service: Deactivated successfully.
Jul 20 09:02:39 ubuntu systemd[1]: Finished Load Kernel Module chromeos_pstore.
Jul 20 09:02:39 ubuntu systemd[1]: Starting Load Kernel Module chromeos_pstore...
Jul 20 09:02:39 ubuntu systemd[1]: modprobe@chromeos_pstore.service: Deactivated successfully.
Jul 20 09:02:39 ubuntu systemd[1]: Finished Load Kernel Module chromeos_pstore.
Jul 20 09:02:39 ubuntu systemd[1]: modprobe@chromeos_pstore.service: Start request repeated too quickly.
Jul 20 09:02:39 ubuntu systemd[1]: modprobe@chromeos_pstore.service: Failed with result 'start-limit-hit'.
Jul 20 09:02:39 ubuntu systemd[1]: Failed to start Load Kernel Module chromeos_pstore.
Jul 20 09:02:39 ubuntu systemd[1]: modprobe@chromeos_pstore.service: Start request repeated too quickly.
Jul 20 09:02:39 ubuntu systemd[1]: modprobe@chromeos_pstore.service: Failed with result 'start-limit-hit'.
Jul 20 09:02:39 ubuntu systemd[1]: Failed to start Load Kernel Module chromeos_pstore.
Jul 20 09:02:39 ubuntu systemd[1]: modprobe@chromeos_pstore.service: Start request repeated too quickly.
Jul 20 09:02:39 ubuntu systemd[1]: modprobe@chromeos_pstore.service: Failed with result 'start-limit-hit'.
Jul 20 09:02:39 ubuntu systemd[1]: Failed to start Load Kernel Module chromeos_pstore.
Jul 20 09:02:39 ubuntu systemd[1]: modprobe@chromeos_pstore.service: Start request repeated too quickly.
Jul 20 09:02:39 ubuntu systemd[1]: modprobe@chromeos_pstore.service: Failed with result 'start-limit-hit'.
Jul 20 09:02:39 ubuntu systemd[1]: Failed to start Load Kernel Module chromeos_pstore.
And this happens because systemd-pstore.service is being called a few consecutive times (each of these lines is a restart):
$ sudo journalctl -u systemd-pstore.service
Jul 20 09:02:37 ubuntu systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
Jul 20 09:02:39 ubuntu systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
Jul 20 09:02:39 ubuntu systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
Jul 20 09:02:39 ubuntu systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
Jul 20 09:02:39 ubuntu systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
Jul 20 09:02:39 ubuntu systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
Interestingly, I see the same on my laptop with jammy:
$ sudo journalctl -b 0 -u systemd-pstore.service
jul 20 07:49:39 numancia systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
jul 20 07:49:40 numancia systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
jul 20 07:49:40 numancia systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
jul 20 07:49:41 numancia systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
It is just that the number of restarts is smaller so it does not trigger errors in the services loading kernel modules because of excessive number of retries.
To summarize, the problem here is that systemd-pstore.service is being tried too many times, we need to understand why that happens. It should be tried once, and as the condition ConditionDirectoryNotEmpty=/sys/fs/pstore fails, it should just stop there. (Trying to load modules is to try to have something in /sys/fs/pstore before starting the service.)
In latest systemd update in jammy, these two lines (that come from systemd/systemd@70e74a5) are added to
systemd-pstore.service:
This commit has not yet been released upstream. On jammy, it is applied downstream.
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1978079
Unfortunately this one is not applied: https://github.com/systemd/systemd/commit/9625350e5381a68c1179ae4581e7586c206663e1
I suggest we try with this patch added.
LP bug for systemd: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1982462