firecracker icon indicating copy to clipboard operation
firecracker copied to clipboard

[Bug] Unable to boot with new(er) kernel

Open wociscz opened this issue 1 year ago • 10 comments

Description

Can't boot the VM with new kernel other than firecracker's 4.14. I'm always getting:

[   12.489510] /dev/root: Can't open blockdev
[   12.489784] VFS: Cannot open root device "vda" or unknown-block(0,0): error -6
[   12.490205] Please append a correct "root=" boot option; here are the available partitions:
[   12.490717] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

Tried firecracker's 5.10.223 and 6.1.102 and also built my own with provided .config from the repo all with the same error as pasted above. When using 4.14 kernel, VM boots without any problem (but it lack's nftables support, which is the reason I'm trying/building the new one)

Static json config and mainly the rootfs drive path options for the VM are the same for all kernel variants with respective changes of the kernel_image_path.

Rootfs is alpine.ext4 file made by the help of this doc.

Host os is Ubuntu with 6.9.5 kernel

To Reproduce

  • Download the mentioned kernel(s) for firecracker
  • Create rootfs by the provided docs
  • Try to boot the VM with 4.14 kernel -> boots ok
  • Try to boot the VM with 5.10 or 6.1 kernel -> fails

Expected behaviour

Boots with newer or own kernel without any problem.

Environment

  • Firecracker version: 1.9.0
  • Host and guest kernel versions: 6.9.5, 4.14, 5.10, 6.1
  • Rootfs used: ext4 in file, Alpine 3.20
  • Architecture: x86_64

Additional context

static json config for the VM:

{
  "boot-source": {
    "kernel_image_path": "path_to_vmlinux_kernel",
    "boot_args": "ro console=ttyS0 noapic reboot=k panic=1 pci=off ip=10.0.1.111::10.0.0.1:255.255.252.0::eth0:off",
    "initrd_path": null
  },
  "drives": [
    {
      "drive_id": "rootfs",
      "partuuid": null,
      "is_root_device": true,
      "cache_type": "Unsafe",
      "is_read_only": false,
      "path_on_host": "alpine.ext4",
      "io_engine": "Sync",
      "rate_limiter": null,
      "socket": null
    }
  ],
  "machine-config": {
    "vcpu_count": 2,
    "mem_size_mib": 1024,
    "smt": false,
    "track_dirty_pages": false,
    "huge_pages": "None"
  },
  "cpu-config": null,
  "balloon": null,
  "network-interfaces": [],
  "vsock": null,
  "logger": null,
  "metrics": null,
  "mmds-config": null,
  "entropy": null
}

Checks

✅ Have you searched the Firecracker Issues database for similar problems? ✅ Have you read the existing relevant Firecracker documentation? ✅ Are you certain the bug being reported is a Firecracker issue?

wociscz avatar Sep 23 '24 13:09 wociscz

I have been having the same problems for weeks/months and have not been able to solve it. In my case I was running 5.10 fine for several months, until it stopped working on new hosts. I've tried Intel and AMD CPUs, built different kernel versions (5.10, 6.1, 6.9), used included and pre-built kernels, used different boot args (e.g. specifying root), built several root filesystems in different ways (ext4 as I did previously, using the included scripts, using Docker, building manually according to the guide), and played with permissions/uids.

I initially suspected it was due to me switching building the rootfs on the host system to building it in a Docker container, however I never got it working again.

Edit: I logged back onto the host that worked. It ran firecracker v1.3.3. Booting a VM with that version works. When I try to boot the same vmlinux with v1.8.0 it fails with the error mentioned in OP.

Linux version and command line args passed by default on firecracker v1.3.3

[    0.000000] Linux version 5.10.184 (root@XXXX) (gcc (Ubuntu 12.2.0-3ubuntu1) 12.2.0, GNU ld (GNU Binutils for Ubuntu) 2.39) #1 SMP Wed Jun 14 18:10:02 UTC 2023
[    0.000000] Command line: noapic reboot=k panic=1 pci=off nomodules ro console=ttyS0 root=/dev/vda rw virtio_mmio.device=4K@0xd0000000:5 virtio_mmio.device=4K@0xd0001000:6 virtio_mmio.device=4K@0xd0002000:7

Linux version and command line args passed by default on firecracker v1.8.0

[    0.000000] Linux version 5.10.184 (root@XXXX) (gcc (Ubuntu 12.2.0-3ubuntu1) 12.2.0, GNU ld (GNU Binutils for Ubuntu) 2.39) #1 SMP Wed Jun 14 18:10:02 UTC 2023
[    0.000000] Command line: panic=1 pci=off nomodules ro console=ttyS0 noapic reboot=k root=/dev/vda rw virtio_mmio.device=4K@0xd0000000:5 virtio_mmio.device=4K@0xd0001000:6

Edit 2: v1.3.3 works v1.6.0 works v1.7.0 works v1.8.0 fails v1.9.0 fails

Kevin-A avatar Sep 25 '24 05:09 Kevin-A

Ok, thanks for the hint with the older versions. It never came to my mind try older versions.

I can confirm that with the firecracker v1.7.0 my config works and microVM boot without any issue. Newer version fails. Only change is the firecracker binary in that case.

Edit: Finally after some tweaking (own 6.1 kernel compile) I am able to run docker inside firecracker which was my original intent. Only the problem with boot of firecracker v1.8.0 and v1.9.0 persist.

wociscz avatar Sep 25 '24 10:09 wociscz

Hello, and thanks for reporting this.

I suspect this has to do with us introducing ACPI support with Firecracker v1.8.0. For mainline kernels to work, we need to compile the kernel with both CONFIG_ACPI and CONFIG_PCI (https://github.com/firecracker-microvm/firecracker/blob/main/docs/kernel-policy.md#booting-with-acpi-x86_64-only).

If only CONFIG_ACPI is used then the kernel fails to parse ACPI tables and it doesn't load the virtio drivers and loading the rootfs, naturally, fails with the error you pasted in the issue description. For our CI, we use Amazon Linux kernels which include a fix that allows kernels built with CONFIG_ACPI only to boot.

We also trying to upstream the same fix: https://www.spinics.net/lists/linux-acpi/msg125662.html

The weird thing, though, is that you observe the behaviour with the kernels from our CI. Could you please:

  1. provide a full kernel log from a failed boot sequence?
  2. Try to build your kernel with both CONFIG_ACPI and CONFIG_PCI enabled and retry?

Disabling ACPI all together should also work, however, we are deprecating MPTable for booting, so I'd really like if we can make building with ACPI smoother :)

bchalios avatar Sep 27 '24 09:09 bchalios

Boot logs with 6.1.102 and 6.1.custom (own build with CONFIG_ACPI and CONFIG_PCI enabled). Firecracker's json config is the same as in original post.

firecracker_boot_6.1.102.txt firecracker_boot_6.1.custom.txt

wociscz avatar Sep 27 '24 09:09 wociscz

Could you drop the noapic kernel parameter from here:

"boot_args": "ro console=ttyS0 noapic reboot=k panic=1 pci=off ip=10.0.1.111::10.0.0.1:255.255.252.0::eth0:off",

bchalios avatar Sep 27 '24 11:09 bchalios

Yep. That did the trick. Now I can boot with v.1.9 without problem.

wociscz avatar Sep 27 '24 11:09 wociscz

My working boot args are now "boot_args": "ro console=ttyS0 reboot=k panic=1" So it might be only the documentation/howto problem at all. Thanks for prompt solution.

wociscz avatar Sep 27 '24 13:09 wociscz

Yes, we should update the documentation to fix that. If you feel like, PRs are welcome. Otherwise, we'll open a PR once we find some free time :)

Thanks again for reporting.

bchalios avatar Sep 27 '24 13:09 bchalios

Same problem when I updated from an older Firecracker version - removing noapic from boot_args fixed it :+1:

pktpls avatar Oct 23 '24 14:10 pktpls

I'm getting this same error with Linux linux-6.8 - is that too new of a kernel? Both CONFIG_ACPI and CONFIG_PCI are enabled in .config

fideloper avatar Dec 06 '24 13:12 fideloper

Hey, we're closing this one, since removing the noapic from the kernel command line fixed the issue reported here.

@fideloper if you're still seeing an issue even without the noapic parameter on the kernel command line, would you please open a separate issue? Thanks!

roypat avatar Aug 20 '25 13:08 roypat