elevate icon indicating copy to clipboard operation
elevate copied to clipboard

[UPG FAIL] Linode VM - Prompt if detected, recommend disabling Lassie (shutdown watchdog)

Open lsthompson opened this issue 2 years ago • 7 comments

Hi there,

I appreciate this isn't a direct flaw in the elevate script, however a warning offering bail-out may be wise to those running a Linode who are seeking to elevate their machine. Linode by default enables Lassie, a reboot watchdog, for all machines.

We went ahead with clearing blockers and proceeded, however Lassie then quite effectively performed several hard reboots during critical phases of the upgrade process. This resulted in an inoperable boot environment, and a server rebuild.

Might be wise to add a warning to the elevate script, allowing Linode users to check for and disable Lassie firstly.

https://www.linode.com/docs/products/compute/compute-instances/guides/lassie-shutdown-watchdog/

Just a thought, Luke

lsthompson avatar May 25 '23 00:05 lsthompson

That certainly might explain why my latest testing efforts on Linode in fact failed.

Will do some testing myself with lassie disabled tonight or tomorrow morning, see if that actually fixes things. If so, a PR will be forthcoming quickly.

troglodyne avatar May 25 '23 00:05 troglodyne

Hmm. As far as I can tell this doesn't really seem to matter. Either way you wind up in a grub shell via lish to have to try to boot into something, as the upgrade fails. All you really do is cause yourself a pain in the ass due to having to manually boot the VM every time it wants to reboot, as that's just the way linode is -- any reboot is just a shutdown unless lassie is enabled.

Anyways, what I get is the following in grub shell:

error: file `/boot/grub/i386-pc/increment.mod' not found.
error: file `/boot/grub/i386-pc/blscfg.mod' not found.
error: can't find command `blscfg'.
error: file `/boot/grub/grubenv' not found.
error: file `/boot/grub/i386-pc/increment.mod' not found.
error: file `/boot/grub/i386-pc/blscfg.mod' not found.
error: can't find command `blscfg'.
error: file `/boot/grub/grubenv' not found.

This of course, makes sense, as:

grub> ls (hd0)/boot/grub
grub.cfg

Nobody home. Basically have to boot manually since grub config is tango uniform.

set root=(hd0,1)
linux /boot/vmlinuz-4.18.0-477.10.1.el8_8.x86_64 root=/dev/sda1
initrd /boot/initramfs-4.18.0-477.10.1.el8_8.x86_64.img
boot

...will at least boot the machine at that point, though dracut is not happy and considers it an emergency if you check glish: Screenshot at 2023-05-25 11-01-43

Will keep investigating, at the least to see if there's some way to give users a good way to work around things when it just absolutely explodes like this.

troglodyne avatar May 25 '23 16:05 troglodyne

Yep. Workaround boot is

set root=(hd0)
linux /boot/vmlinuz-4.18.0-477.10.1.el8_8.x86_64 root=/dev/disk/by-label/linode-root
initrd /boot/initramfs-4.18.0-477.10.1.el8_8.x86_64.img
boot

Presumably would need to just keep doing that till it is done then repair grub afterwards, we shall see. Definitely looking like some things we might be able to work around here to avoid it

troglodyne avatar May 25 '23 16:05 troglodyne

So after stage 5 we get that as expected, though we enter a different failure mode via dracut

Failed to switch root: Specified root path '/sysroot' does not seem to be an OS tree.

I can certainly see why the normal reaction to this would just be hitting eject. Presumably here it just needs to be mounting the disk properly, as that certainly is possible within rescue terminal. Continuing...

troglodyne avatar May 25 '23 17:05 troglodyne

So eventually we get a workable system that stops rebooting and reports great success. After that just a matter of repair, judging by a forums thread here on this specific issue (lol): https://almalinux.discourse.group/t/how-to-repair-rebuild-grub-following-a-cross-upgrade-from-centos-7-to-almalinux-8/1268

Presumably the "suggested workaround" there will be key to whatever approach we take for avoiding the problem/give a more appropriate blocker message.

troglodyne avatar May 25 '23 17:05 troglodyne

So, after investigating previous work we did around the blocker for GRUB_ENABLE_BLSCFG, I have concluded that the blocker is entirely unnecessary, but for reasons of "CentOS 7 doesn't install this value to the default grub config anyways". We still have blocker code there, but it will never fire. You instead get splashdown on upgrade due to the new config shipped with almalinux 8 setting this as a default. This instead must be addressed before the relevant reboot instead of failing to block this ahead of time.

troglodyne avatar May 26 '23 00:05 troglodyne

First attempt at post-leapp fix has failed. Possibly due to this executing later than is needed. Need to ensure this happens while we are booted into single user mode. Post leapp run but before reboot.

troglodyne avatar May 26 '23 01:05 troglodyne