systemd icon indicating copy to clipboard operation
systemd copied to clipboard

stub: Allow booting compressed aarch64 Linux kernel images

Open martinezjavier opened this issue 3 years ago • 40 comments

Is your feature request related to a problem? Please describe.

The Linux kernel does not have a decompressor for the aarch64 architecture, the arm64 booting documentation says:

3. Decompress the kernel image
------------------------------

Requirement: OPTIONAL

The AArch64 kernel does not currently provide a decompressor and
therefore requires decompression (gzip etc.) to be performed by the boot
loader if a compressed Image target (e.g. Image.gz) is used.  For
bootloaders that do not implement this requirement, the uncompressed
Image target is available instead.

Which means that if a Linux kernel image is shipped compressed, the linuxaa64.efi.stub won't be able to boot the image embedded in the .linux section.

Describe the solution you'd like

While it is possible to use uncompressed Linux kernel images, the stub should be able to decompress the images if needed. Since most distributions would ship the kernel images compressed for all the supported architectures. Since most bootloaders are able to decompress them.

Describe alternatives you've considered

The UEFI specification mentions a EFI_DECOMPRESS_PROTOCOL.Decompress() service that could be used by the stub to decompress the kernel images and not having to implement its own decompressor.

The systemd version you checked that didn't have the feature you are asking for

Latest upstream main branch.

martinezjavier avatar Jun 20 '22 14:06 martinezjavier

#23347 is the matching RFE for the spec.

keszybz avatar Jun 23 '22 19:06 keszybz

Hmm, so there are two approaches here:

  1. first approach: implement this in sd-boot (what I had in mind in #23347): i.e. instead of just invoking the EFI binary, decompress it into memory first, make a file system protocol object from it, then run it.

  2. second approach: implement this in sd-stub (what this issue I think suggests): i.e. leave the EFI stub uncompressed, but include the kernel in compressed from in the outer PE image. Then, from the stub decompress it, and synthesize the file system protocol object there before invoking.

I figure the second approach is nicer: it means the firmware can invoke our images directly, i.e. the files become entirely self contained, which I think is good. It also means the sysext/creds pick-up that we do in sd-stub continues to work just fine, because the fs protocol object the stub PE binary is invoked from is a real fs one, so that we can use it to enumerate files adjacent to the image. If we already synthesized an fs protocol object at that point, this is not doable that easily.

poettering avatar Jul 04 '22 10:07 poettering

Mind you, EFI_DECOMPRESS_PROTOCOL is zlib only.

medhefgo avatar Jul 04 '22 11:07 medhefgo

Yeah, we'd have to embedd a minimal copy of zlib decompress (or whatever format is en vogue today) in our codebase i fear. yuck.

poettering avatar Jul 04 '22 11:07 poettering

Yeah, we'd have to embedd a minimal copy of zlib decompress (or whatever format is en vogue today) in our codebase i fear. yuck.

You mean zstd here, right? The EFI_DECOMPRESS_PROTOCOL is provided by the firmware. :P

We don't need to vendor those, in theory. Providing a path to a static lib during build should be enough so that we can link against it.

medhefgo avatar Jul 04 '22 11:07 medhefgo

We don't need to vendor those, in theory. Providing a path to a static lib during build should be enough so that we can link against it.

Well, depends: if they rely on libc symbols (and I think zlib does, it provides fopen() wrappers, no?) then a regular distros static lib won't work. I doubt there's a way around vendoring this.

But I am curious, what actually is en vogue as compressor for this usecase currently? gz? zstd? xz? The fewer we have to support, the better.

poettering avatar Jul 04 '22 11:07 poettering

The EFI_DECOMPRESS_PROTOCOL is provided by the firmware. :P

yeah, I'd ignore this bit. sounds risky to rely on that given that such decompressors are parsers and hence security sensitive.

poettering avatar Jul 04 '22 11:07 poettering

There's a third option, to convince the Linux aarch64 maintainers to add a decompressor in the kernel.

They are were against doing it in the past, i.e: http://lists.infradead.org/pipermail/linux-arm-kernel/2014-January/225277.html but really it's silly that all bootloaders need to vendor decompression libraries and catch-up with whatever format is used.

martinezjavier avatar Jul 04 '22 11:07 martinezjavier

En vouge is zstd today. I even looked at the headers and it seems to provide a sane minimalist API that wouldn't require any libc stuff (afaik). Question is whether their build system supports that use case out of the box.

yeah, I'd ignore this bit. sounds risky to rely on that given that such decompressors are parsers and hence security sensitive.

That would not be an issue under secure boot.

medhefgo avatar Jul 04 '22 11:07 medhefgo

of course, I'd prefer not having to deal with all this. I don't really understand why the decompression code that apparently already exists in the kernel and is regularly used on x86 can't be enabled for arm too

poettering avatar Jul 04 '22 11:07 poettering

To make our lifes miserable?

medhefgo avatar Jul 04 '22 11:07 medhefgo

of course, I'd prefer not having to deal with all this. I don't really understand why the decompression code that apparently already exists in the kernel and is regularly used on x86 can't be enabled for arm too

Me neither... other than the answer in the email thread that I referenced before.

They also documented it in https://www.kernel.org/doc/html/latest/arm64/booting.html#decompress-the-kernel-image

martinezjavier avatar Jul 04 '22 11:07 martinezjavier

@poettering @medhefgo maybe a silly thought but what about instead just document in the BootLoaderSpec that some arches on Linux have this limitation and are not able to self-decompress? And make it explicit that the sd-stub isn't able to decompress and so if someone wants to use the BLS type 2 (unified kernels) on these arches, they must have their kernel decompressed?

Because the Linux kernel supports a bunch of compression formats (gzip, bzip2, lzma, xz, lzo, lz4 and zstd) so even if the sd-stub supports what's in en vogue (ztsd IIUC) there will be cases where it won't work anyways depending on the compression algorithm used by the distro.

martinezjavier avatar Jul 07 '22 14:07 martinezjavier

Just my 3 cents: u-boot does not handle compressed initrd images https://github.com/u-boot/u-boot/blob/b960d654cbad172ba43229b3990f0b8d3a134f7a/lib/efi_loader/efi_image_loader.c#L838

tpgxyz avatar Jul 10 '22 18:07 tpgxyz

The reason for avoiding a decompressor in the arm64 tree is not to make your lives miserable, I can assure you :-)

The problem is not decompression per se - the problem is that you need to bootstrap another minimal execution environment, which needs to be built, linked and executed so you can invoke the decompression algorithm. On ARM, this needs the MMU and caches to be enabled, or it will not only be dead slow, but also reject things like unaligned accesses and zero-by-cacheline (DC ZVA) instructions. Since we cannot just map the entire address space with cacheable attributes (as speculative instruction fetches or data accesses from device regions must be avoided), we now have to parse either the device tree (DT) or the EFI memory map to find out where the memory lives to begin with, in order to map it.

I had a stab at implementing a generic EFI decompressor for all non-x86 EFI architectures in Linux (arm64, ARM, RISC-V, LoongArch) here, which mostly works fine. The only issue there is that it breaks UEFI secure boot, unless we find a way to sign both the inner and outer PE/COFF images during the build.

ardbiesheuvel avatar Aug 04 '22 17:08 ardbiesheuvel

@ardbiesheuvel hmm, that's really helpful.

The Secureboot signing thing is interesting indeed. If we had the decompression in sd-boot then indeed signing would be easier. In that light it might indeed be better to decompress in the boot loader instead of the kernel image.

So I am not totally opposed to doing decompression in sd-boot. But I'd really like to keep this minimal. i.e. only one relevant compression algorithm (zstd then?) and I'd be really keen on managing this in a reasonable way so that we can still receive updates from the zsdt reference implementation in a sensible way. i.e. meson subproject stuff ideally. Not sure if that is possible with the reference implementation though given we also need it without referencing any libc symbols...

I wonder how open the zstd people would be to make things like this easy. I think they actually support building with meson, question is if we can vendor it in as meson subproject that way and without linking to libc...

poettering avatar Aug 04 '22 20:08 poettering

I had a stab at implementing a generic EFI decompressor for all non-x86 EFI architectures in Linux (arm64, ARM, RISC-V, LoongArch) here, which mostly works fine. The only issue there is that it breaks UEFI secure boot, unless we find a way to sign both the inner and outer PE/COFF images during the build.

Well, we're faced with the same issue already?

The way we work around it right now is that we copy the kernel image from the stub to correctly allocated memory and then call the PE entry point directly. We don't even perform PE relocations (I guess we're lucky the kernel doesn't blow up on us?).

Also, why not implement this decompression within the normal stub (so you'd only have the payload compressed in a section)?

medhefgo avatar Aug 08 '22 07:08 medhefgo

I had a stab at implementing a generic EFI decompressor for all non-x86 EFI architectures in Linux (arm64, ARM, RISC-V, LoongArch) here, which mostly works fine. The only issue there is that it breaks UEFI secure boot, unless we find a way to sign both the inner and outer PE/COFF images during the build.

Well, we're faced with the same issue already?

The way we work around it right now is that we copy the kernel image from the stub to correctly allocated memory and then call the PE entry point directly. We don't even perform PE relocations (I guess we're lucky the kernel doesn't blow up on us?).

Who are 'we' in this context? systemd-boot?

If systemd-boot implements EFI boot by using LoadImage and instead of using StartImage, doing something else to boot the image, it is definitely doing something non-portable, and this will only generally work with the x86 kernel. The same goes for the EFI handover protocol, which has been deprecated because it blurs the lines between EFI, Linux and architecture too much.

Note that LoadImage() will take care of the PE/COFF relocations, so this should work even for chainloading arbitrary PE/COFF executables other than Linux (Linux PE/COFF images are built without relocations)

But doing anything other than calling StartImage is a hack. This includes calling into shim to do load and/or start the kernel - shim+grub+stub on x86 is such a pile of hacks it is not even funny anymore.

Also, why not implement this decompression within the normal stub (so you'd only have the payload compressed in a section)?

The 'payload' in this case is the kernel's executable image. Which means we still need another executable image to perform the actual decompression. (On Linux/arm64, the stub and the kernel proper are essentially the same executable image with different entry points)

ardbiesheuvel avatar Aug 08 '22 07:08 ardbiesheuvel

With secure boot in the mix there really are only three options:

  1. Have the payload be signed (yuck)
  2. Do the work of LoadImage/StartImage ourselves if we trust the payload (this is what sd-stub does on non-x86 right now)
  3. Hack into EFI_SECURITY_ARCH_PROTOCOL to trick the firmware into trusting our payload (this is how shim allows us to boot shim-signed binaries).

Pick your poison. I originally wanted to do 3 because it allows us to use LoadImage/StartImage and is fairly easy to do. The person who added arm64 support to sd-stub chose 2.

medhefgo avatar Aug 08 '22 09:08 medhefgo

With secure boot in the mix there really are only three options:

  1. Have the payload be signed (yuck)
  2. Do the work of LoadImage/StartImage ourselves if we trust the payload (this is what sd-stub does on non-x86 right now)
  3. Hack into EFI_SECURITY_ARCH_PROTOCOL to trick the firmware into trusting our payload (this is how shim allows us to boot shim-signed binaries).

So what is sd-stub? Is that part of systemd-boot?

Pick your poison. I originally wanted to do 3 because it allows us to use LoadImage/StartImage and is fairly easy to do. The person who added arm64 support to sd-stub chose 2.

Wait what? Are you using systemd-boot on arm64 does not use LoadImage/StartImage? Or only with secure boot enabled?

In any case, this is fundamentally broken, and a huge burden in terms of technical debt - things like measure boot and other processing that is implicitly part of LoadImage/StartImage will have to be reimplemented in your LoadImage/StartImage implementation. This is exactly why shim+grub is such a horror show, and I am really disappointed to learn that the formerly 'EFI-clean' gummiboot/systemd-boot has gone down this road.

ardbiesheuvel avatar Aug 08 '22 09:08 ardbiesheuvel

So what is sd-stub? Is that part of systemd-boot?

sd-stub is the stub loader for unified kernel images. It allows stuffing the kernel image + discovery metadata + initrd + dtb + kernel args + splash image etc into a single efi binary (which is then supposed to be signed).

Wait what? Are you using systemd-boot on arm64 does not use LoadImage/StartImage? Or only with secure boot enabled?

In any case, this is fundamentally broken, and a huge burden in terms of technical debt - things like measure boot and other processing that is implicitly part of LoadImage/StartImage will have to be reimplemented in your LoadImage/StartImage implementation. This is exactly why shim+grub is such a horror show, and I am really disappointed to learn that the formerly 'EFI-clean' gummiboot/systemd-boot has gone down this road.

sd-boot has two types of boot entries: regular .conf ones and unified kernel images. The .conf ones are launched using LoadImage/StartImage. If it's a unified kernel image (sd-stub) than that is launched with LoadImage/StartImage and that will then chain into the kernel (on x86 using the EFI handover protocol, on others by copying the kernel payload to a suitably allocated area and calling into the pe entry point with a custom created EFI_LOADED_IMAGE_PROTOCOL).

We do intend to drop EFI handover at some point, but LINUX_INITRD_MEDIA_GUID support is too recent to do that (on x86).

See https://github.com/systemd/systemd/blob/2fb11652381c199ad19bb469e530543366d99dd4/src/boot/efi/linux.c#L107 for how we setup/call into the kernel if you're interested.

medhefgo avatar Aug 08 '22 09:08 medhefgo

So what is sd-stub? Is that part of systemd-boot?

It's part of systemd, but not of systemd-boot (though it shares some sources with it). It's an UEFI stub, that does various nice things around unified kernels, boot splash, TPM measurements.

You can use systemd-boot without buying into systemd-stub. You can also use systemd-stub without buying into systemd-boot. But ideally you use them in combination.

Here's the man page of systemd-stub, to give you an idea why it is useful:

https://www.freedesktop.org/software/systemd/man/systemd-stub.html

poettering avatar Aug 08 '22 10:08 poettering

As a temp fix (because I'm hacking systemd-boot support into anaconda as a POC) I've been detecting the combo of systemd-boot, and a compressed arm kernel and just decompressing it in kernel-install. That isn't ideal on some of these really slow SD/etc devices, but at the moment it works.

jlinton avatar Aug 10 '22 03:08 jlinton

Just FYI, this is also an issue on Asahi Linux (linux on M1 Macs).

Asahi ships GRUB out of the box, and when switching to systemd-boot I hit this issue. I haven't been able to make it work even with an uncompressed kernel either. This last thing might be user error.

Currently booting without the stub (e.g.: not using a UEFI bundle but separate kernel+initrd).

WhyNotHugo avatar Aug 17 '22 08:08 WhyNotHugo

As a temp fix (because I'm hacking systemd-boot support into anaconda as a POC) I've been detecting the combo of systemd-boot, and a compressed arm kernel and just decompressing it in kernel-install. That isn't ideal on some of these really slow SD/etc devices, but at the moment it works.

Can you give some more information on how you did this? I've been trying to do the same and while decompressing the image works, that leaves you with the linux kernel image in ELF format which EFI doesn't recognize. How do you go back from the decompressed kernel image to the PE/COFF format that EFI recognizes?

DaanDeMeyer avatar Aug 30 '22 22:08 DaanDeMeyer

So I've been testing @ardbiesheuvel's generic compressed boot for efi patchset (https://lore.kernel.org/lkml/[email protected]/T/), and while it works perfectly for the regular linux image scenario started using sd-boot, it doesn't quite work for the stub yet.

The issue I'm running into is that zboot is querying the loaded image device path protocol of the loaded image, but we never install a loaded image device path protocol in the stub, so zboot fails.

The spec mentions that the device path protocol for the loaded image is optional. @ardbiesheuvel Would it be possible to make the loaded image device path protocol handling in zboot optional (don't do anything with it if it isn't in place)? If not, we'll need to modify sd-boot to install a loaded image device path protocol in the stub code to make this work.

It's also interesting to note that the zboot patchset chooses to sign both the outer zboot image and the inner Linux image to make secure boot work, whereas in the stub we work around the secure boot issue by invoking the kernel PE entry point directly. Maybe it makes sense for us to adopt the same approach for unified kernel images so that we can use LoadImage/StartImage instead of the hacks we do now?

cc @medhefgo since you mentioned using LoadImage()/StartImage() as well in the PR that added the PE kernel entry stuff.

DaanDeMeyer avatar Sep 20 '22 11:09 DaanDeMeyer

I am in the process of switching to LoadImage/StartImage in our stub. There is a neat hack using the EFI_SECURITY_ARCH_PROTOCOL to work around any unsigned payloads (effectively the same way we support shim protocol in sd-boot). I was also investigating adding zstd support into the stub, from the first look it should be fairly easy to do…

medhefgo avatar Sep 20 '22 11:09 medhefgo

So I've been testing @ardbiesheuvel's generic compressed boot for efi patchset (https://lore.kernel.org/lkml/[email protected]/T/), and while it works perfectly for the regular linux image scenario started using sd-boot, it doesn't quite work for the stub yet.

The issue I'm running into is that zboot is querying the loaded image device path protocol of the loaded image, but we never install a loaded image device path protocol in the stub, so zboot fails.

The spec mentions that the device path protocol for the loaded image is optional.

Hmm, my copy of the EFI spec says "The Loaded Image Device Path Protocol must be installed onto the image handle of a PE/COFF image loaded through the EFI Boot Service LoadImage()." Where does the spec mention that it is optional?

@ardbiesheuvel Would it be possible to make the loaded image device path protocol handling in zboot optional (don't do anything with it if it isn't in place)? If not, we'll need to modify sd-boot to install a loaded image device path protocol in the stub code to make this work.

You can just install it with a NULL pointer and things should work - this is what EFI itself does if LoadImage() is called with a NULL device path and source buffer and size.

It's also interesting to note that the zboot patchset chooses to sign both the outer zboot image and the inner Linux image to make secure boot work, whereas in the stub we work around the secure boot issue by invoking the kernel PE entry point directly.

As I have stated many times before, this is really a hack. I fully understand that this has been a necessary hack on x86 PCs built to run Windows, but on arm64, I strongly urge you not to roll your own LoadImage/StartImage() - this has been a constant source of bugs with Fedora's version of GRUB, for instance. (I mentioned this in my LPC talk last week as well)

Maybe it makes sense for us to adopt the same approach for unified kernel images so that we can use LoadImage/StartImage instead of the hacks we do now?

That would be much better, yes.

cc @medhefgo since you mentioned using LoadImage()/StartImage() as well in the PR that added the PE kernel entry stuff.

ardbiesheuvel avatar Sep 20 '22 12:09 ardbiesheuvel

I am in the process of switching to LoadImage/StartImage in our stub. There is a neat hack using the EFI_SECURITY_ARCH_PROTOCOL to work around any unsigned payloads (effectively the same way we support shim protocol in sd-boot).

Yes, I played around with that as well, but @mjg59 mentioned that this is not a reliable workaround, and therefore not enabled by default in shim

I was also investigating adding zstd support into the stub, from the first look it should be fairly easy to do…

I just merged the EFI zboot support for 6.1 so you will only need this for older kernels. Still seems like a reasonable place to use compression, so I'm not saying it is a bad idea.

ardbiesheuvel avatar Sep 20 '22 12:09 ardbiesheuvel

Hmm, my copy of the EFI spec says "The Loaded Image Device Path Protocol must be installed onto the image handle of a PE/COFF image loaded through the EFI Boot Service LoadImage()." Where does the spec mention that it is optional?

Ah my statement was wrong, the spec says it's optional to provide a device path to LoadImage() if using the Source arguments. But I assume in that case LoadImage() will synthesize a device path itself and put that in LOADED_IMAGE_DEVICE_PATH_PROTOCOL. So indeed it seems like it's required and we should make sure one is installed in sd-boot, regardless of whether we use LoadImage() or keep that PE hack we do now.

DaanDeMeyer avatar Sep 20 '22 12:09 DaanDeMeyer