Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Unable to boot with new(er) kernel #4816

Open
wociscz opened this issue Sep 23, 2024 · 10 comments
Open

[Bug] Unable to boot with new(er) kernel #4816

wociscz opened this issue Sep 23, 2024 · 10 comments
Labels
Priority: High Indicates than an issue or pull request should be resolved ahead of issues or pull requests labelled Status: WIP Indicates that an issue is currently being worked on or triaged

Comments

@wociscz
Copy link

wociscz commented Sep 23, 2024

Description

Can't boot the VM with new kernel other than firecracker's 4.14.
I'm always getting:

[   12.489510] /dev/root: Can't open blockdev
[   12.489784] VFS: Cannot open root device "vda" or unknown-block(0,0): error -6
[   12.490205] Please append a correct "root=" boot option; here are the available partitions:
[   12.490717] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

Tried firecracker's 5.10.223 and 6.1.102 and also built my own with provided .config from the repo all with the same error as pasted above.
When using 4.14 kernel, VM boots without any problem (but it lack's nftables support, which is the reason I'm trying/building the new one)

Static json config and mainly the rootfs drive path options for the VM are the same for all kernel variants with respective changes of the kernel_image_path.

Rootfs is alpine.ext4 file made by the help of this doc.

Host os is Ubuntu with 6.9.5 kernel

To Reproduce

  • Download the mentioned kernel(s) for firecracker
  • Create rootfs by the provided docs
  • Try to boot the VM with 4.14 kernel -> boots ok
  • Try to boot the VM with 5.10 or 6.1 kernel -> fails

Expected behaviour

Boots with newer or own kernel without any problem.

Environment

  • Firecracker version: 1.9.0
  • Host and guest kernel versions: 6.9.5, 4.14, 5.10, 6.1
  • Rootfs used: ext4 in file, Alpine 3.20
  • Architecture: x86_64

Additional context

static json config for the VM:

{
  "boot-source": {
    "kernel_image_path": "path_to_vmlinux_kernel",
    "boot_args": "ro console=ttyS0 noapic reboot=k panic=1 pci=off ip=10.0.1.111::10.0.0.1:255.255.252.0::eth0:off",
    "initrd_path": null
  },
  "drives": [
    {
      "drive_id": "rootfs",
      "partuuid": null,
      "is_root_device": true,
      "cache_type": "Unsafe",
      "is_read_only": false,
      "path_on_host": "alpine.ext4",
      "io_engine": "Sync",
      "rate_limiter": null,
      "socket": null
    }
  ],
  "machine-config": {
    "vcpu_count": 2,
    "mem_size_mib": 1024,
    "smt": false,
    "track_dirty_pages": false,
    "huge_pages": "None"
  },
  "cpu-config": null,
  "balloon": null,
  "network-interfaces": [],
  "vsock": null,
  "logger": null,
  "metrics": null,
  "mmds-config": null,
  "entropy": null
}

Checks

✅ Have you searched the Firecracker Issues database for similar problems?
✅ Have you read the existing relevant Firecracker documentation?
✅ Are you certain the bug being reported is a Firecracker issue?

@Kevin-A
Copy link

Kevin-A commented Sep 25, 2024

I have been having the same problems for weeks/months and have not been able to solve it. In my case I was running 5.10 fine for several months, until it stopped working on new hosts. I've tried Intel and AMD CPUs, built different kernel versions (5.10, 6.1, 6.9), used included and pre-built kernels, used different boot args (e.g. specifying root), built several root filesystems in different ways (ext4 as I did previously, using the included scripts, using Docker, building manually according to the guide), and played with permissions/uids.

I initially suspected it was due to me switching building the rootfs on the host system to building it in a Docker container, however I never got it working again.

Edit:
I logged back onto the host that worked. It ran firecracker v1.3.3. Booting a VM with that version works. When I try to boot the same vmlinux with v1.8.0 it fails with the error mentioned in OP.

Linux version and command line args passed by default on firecracker v1.3.3

[    0.000000] Linux version 5.10.184 (root@XXXX) (gcc (Ubuntu 12.2.0-3ubuntu1) 12.2.0, GNU ld (GNU Binutils for Ubuntu) 2.39) #1 SMP Wed Jun 14 18:10:02 UTC 2023
[    0.000000] Command line: noapic reboot=k panic=1 pci=off nomodules ro console=ttyS0 root=/dev/vda rw virtio_mmio.device=4K@0xd0000000:5 virtio_mmio.device=4K@0xd0001000:6 virtio_mmio.device=4K@0xd0002000:7

Linux version and command line args passed by default on firecracker v1.8.0

[    0.000000] Linux version 5.10.184 (root@XXXX) (gcc (Ubuntu 12.2.0-3ubuntu1) 12.2.0, GNU ld (GNU Binutils for Ubuntu) 2.39) #1 SMP Wed Jun 14 18:10:02 UTC 2023
[    0.000000] Command line: panic=1 pci=off nomodules ro console=ttyS0 noapic reboot=k root=/dev/vda rw virtio_mmio.device=4K@0xd0000000:5 virtio_mmio.device=4K@0xd0001000:6

Edit 2:
v1.3.3 works
v1.6.0 works
v1.7.0 works
v1.8.0 fails
v1.9.0 fails

@wociscz
Copy link
Author

wociscz commented Sep 25, 2024

Ok, thanks for the hint with the older versions. It never came to my mind try older versions.

I can confirm that with the firecracker v1.7.0 my config works and microVM boot without any issue.
Newer version fails. Only change is the firecracker binary in that case.

Edit: Finally after some tweaking (own 6.1 kernel compile) I am able to run docker inside firecracker which was my original intent. Only the problem with boot of firecracker v1.8.0 and v1.9.0 persist.

@bchalios bchalios added Status: WIP Indicates that an issue is currently being worked on or triaged Priority: High Indicates than an issue or pull request should be resolved ahead of issues or pull requests labelled labels Sep 25, 2024
@bchalios
Copy link
Contributor

Hello, and thanks for reporting this.

I suspect this has to do with us introducing ACPI support with Firecracker v1.8.0.
For mainline kernels to work, we need to compile the kernel with both CONFIG_ACPI and CONFIG_PCI (https://github.com/firecracker-microvm/firecracker/blob/main/docs/kernel-policy.md#booting-with-acpi-x86_64-only).

If only CONFIG_ACPI is used then the kernel fails to parse ACPI tables and it doesn't load the virtio drivers and loading the rootfs, naturally, fails with the error you pasted in the issue description. For our CI, we use Amazon Linux kernels which include a fix that allows kernels built with CONFIG_ACPI only to boot.

We also trying to upstream the same fix: https://www.spinics.net/lists/linux-acpi/msg125662.html

The weird thing, though, is that you observe the behaviour with the kernels from our CI. Could you please:

  1. provide a full kernel log from a failed boot sequence?
  2. Try to build your kernel with both CONFIG_ACPI and CONFIG_PCI enabled and retry?

Disabling ACPI all together should also work, however, we are deprecating MPTable for booting, so I'd really like if we can make building with ACPI smoother :)

@wociscz
Copy link
Author

wociscz commented Sep 27, 2024

Boot logs with 6.1.102 and 6.1.custom (own build with CONFIG_ACPI and CONFIG_PCI enabled). Firecracker's json config is the same as in original post.

firecracker_boot_6.1.102.txt
firecracker_boot_6.1.custom.txt

@bchalios
Copy link
Contributor

Could you drop the noapic kernel parameter from here:

"boot_args": "ro console=ttyS0 noapic reboot=k panic=1 pci=off ip=10.0.1.111::10.0.0.1:255.255.252.0::eth0:off",

@wociscz
Copy link
Author

wociscz commented Sep 27, 2024

Yep. That did the trick. Now I can boot with v.1.9 without problem.

@wociscz
Copy link
Author

wociscz commented Sep 27, 2024

My working boot args are now "boot_args": "ro console=ttyS0 reboot=k panic=1"
So it might be only the documentation/howto problem at all. Thanks for prompt solution.

@bchalios
Copy link
Contributor

Yes, we should update the documentation to fix that. If you feel like, PRs are welcome. Otherwise, we'll open a PR once we find some free time :)

Thanks again for reporting.

@pktpls
Copy link

pktpls commented Oct 23, 2024

Same problem when I updated from an older Firecracker version - removing noapic from boot_args fixed it 👍

@fideloper
Copy link

I'm getting this same error with Linux linux-6.8 - is that too new of a kernel? Both CONFIG_ACPI and CONFIG_PCI are enabled in .config

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: High Indicates than an issue or pull request should be resolved ahead of issues or pull requests labelled Status: WIP Indicates that an issue is currently being worked on or triaged
Projects
None yet
Development

No branches or pull requests

5 participants