Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pvh: Support booting via PVH ELFNOTE #26

Merged
merged 16 commits into from
Mar 30, 2020
Merged

Conversation

josephlr
Copy link
Contributor

@josephlr josephlr commented Nov 14, 2019

Fixes #6

This PR adds support for booting via the Xen HVM direct boot ABI otherwise known as the PVH Boot Protocol.

This required significant cleanup to how we load the Linux Kernel, as the existing code wasn't doing the exact correct thing. All of the commits that don't start with pvh: are part of this cleanup. The commits that do start with pvh: specifically address adding PVH support (including the transition for 32-bit to 64-bit mode).

This allows our firmware to be used with QEMU's -kernel option. This can be invoked like:

qemu-system-x86_64
-machine type=q35,accel=kvm \
-cpu host \
-smp 1 \
-m 1024 \
-display none \
-serial stdio \
-kernel target/target/release/hypervisor-fw \
-drive file=clear-28660-kvm.img,format=raw,if=none,id=boot-drive \
-device virtio-blk-pci,drive=boot-drive,disable-legacy=on

Also, as cloud-hypervisor supports loading PVH ELF binaries, the PVH entry point will also be used when using cloud-hypervisor's --kernel option. This also means that no current hypervisors will use the Linux Boot Protocol path.

Signed-off-by: Joe Richey [email protected]

@rbradford
Copy link
Member

rbradford commented Nov 15, 2019

@rbradford, is this the same issue that prevents booting on firecracker? Resetting Virtio Devices?

That manifests itself as a failure to mount the filesystem. You would see a lot of kernel messages up to that point.

I think I know why it gets further: when using the -kernel entry point it will use Seabios with the PVH option ROM so it will go through that initialisation code (I also tried qboot and that still has a zero bar for the PCI device.)

Now that you're getting this far with I think you can probably use qemu's gdbserver with the kernel binary to check progress.

Alternatively when I was developing this I build my own kernel and dropped it into the appropriate directory and used "outb" debugging to trace the progress though the assembly code. It might sound fiddly but if you script the build/mount image/copy binary/unmount then it isn't so bad.

It's definitely an issue with some code running in the kernel as I printed out the jump address for both CH and QEMU and they are the same.

@josephlr
Copy link
Contributor Author

josephlr commented Mar 2, 2020

Good news, I've got some local changes that got the firmware all the way booting on QEMU.

Working on breaking them out into a reasonable patchset.

@josephlr
Copy link
Contributor Author

josephlr commented Mar 3, 2020

@rbradford this code should now work with QEMU using the command in the PR description. Try it out and let me know if it works for you. I think it will be much easier to merge this first and then work on #24 (where we need to deal with real-mode code and other subtleties).

This isn't quite ready, I still need to split this up (I ended up rewriting most of src/bzimage.rs). Note that the QEMU PVH implemenation has some minor bugs (I submitted this patch to fix one), but none severe enough to keep Linux from booting.

@rbradford
Copy link
Member

Very cool, works for me 👍

@rbradford
Copy link
Member

I had just a brief skim through the code changes and they look good. I look forward to seeing the breakdown into the individual commits.

@rbradford
Copy link
Member

@josephlr How's this going? We now have PVH support in Cloud Hypervisor so looking forward to seeing this mature. :-)

@josephlr
Copy link
Contributor Author

@josephlr How's this going? We now have PVH support in Cloud Hypervisor so looking forward to seeing this mature. :-)

PTAL, this is now ready for review. I rewrote the commit history to explain all of the changes that are going on here. I had to rewrite most of bzimage.rs as we are now using a very different process to load the kernel.

Let me know if anything is unclear here. I'd be happy to reword any commit messages.

Copy link
Member

@rbradford rbradford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I just want to run it through the integration tests for Cloud Hypervisor.

@@ -244,6 +244,24 @@ pub trait Read {
fn read(&mut self, data: &mut [u8]) -> Result<u32, Error>;
fn seek(&mut self, offset: u32) -> Result<(), Error>;
fn get_size(&self) -> u32;

// Loads the remainer of the file into the specified memory region
fn load_file(&mut self, mem: &mut MemoryRegion) -> Result<(), Error> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@josephlr
Copy link
Contributor Author

Looks good. I just want to run it through the integration tests for Cloud Hypervisor.

@rbradford I just added 4fd1677 (which removes the serial debugging in the ASM) and 6ef27a1 (which cleans up and comments the linker script).

I just wanted to make sure those commits didn't get missed.

src/fat.rs Outdated Show resolved Hide resolved
We don't actually need an AtomicRefcell here. It's fine for
paging::setup() to run multiple times, it's idempontent and we only
do writes.

We also place the page tables in #[no_mangle] static mut variables so
they can be linked from assembly code.

Signed-off-by: Joe Richey <[email protected]>
We really only need from_bytes and as_bytes. There's no need for a
generic from_slice method.

Signed-off-by: Joe Richey <[email protected]>
As our error types are enums, we can include the "source" error in our
later error types. This allows for more information when debugging
failures. To this end, we also implement Debug.

Signed-off-by: Joe Richey <[email protected]>
We can have `ascii_strip` use entirely safe code by `unwrap()`ing the
result of `from_utf8` instead of using `from_utf8_unchecked`. We also
add a helper function to convert a C-string pointer into a byte slice.

Signed-off-by: Joe Richey <[email protected]>
To support multiple boot protocols, we need a common abstraction for
the information given in a boot protocol. This is the Info trait. This
also requires adding a common E820Entry structure, so we can get the
memory map.

We also add a boot::Params structure (i.e. the Linux Zeropage) to make
reading/writing the structure easier. This will let us avoid needing to
hardcode struct offsets.

The layout for these structures is taken from the Kernel's
    arch/x86/include/uapi/asm/bootparam.h

Signed-off-by: Joe Richey <[email protected]>
This allows much of the existing code to be simplified, by letting us
load the remainer of a file into a specific memory region.

This also makes it much easier to write the code to load the Linux
Kernel header from the bzimage file.

Signed-off-by: Joe Richey <[email protected]>
This works because Option<&T> has the same FFI layout as *const T
except that a null pointer is None.

Signed-off-by: Joe Richey <[email protected]>
This allows efi_exec to work with multiple boot protocols.

Signed-off-by: Joe Richey <[email protected]>
This allows Linux to be booted with any boot protocol.

The old code took in the Zeropage passed in via the Linux Kernel Boot
Protocol, modified it, and passed it into the Linux Kernel. This is not
the correct way to boot Linux per the documentation:
    https://www.kernel.org/doc/Documentation/x86/boot.txt

This code now correctly:
  - Uses a brand-new Zeropage inside the `Kernel` struct
  - Adds in the E820 map and RSDP pointer from the boot::Info
  - Reads the header from the file and copies it into the Zeropage
  - Loads the kernel and initrd into avalible memory
  - Properly manages the command-line at a fixed memory location
  - Jumps to the appropriate starting address

Signed-off-by: Joe Richey <[email protected]>
We now pass the boot::Info to `loader::load_default_entry` and
`efi::efi_exec`. We also reorganize the code in this function to:
  - Avoid unnecessary nesting
  - log errors when they occur

The code is now much more readable

Signed-off-by: Joe Richey <[email protected]>
These are simply translated from the C structs in:
    xen/include/public/arch-x86/hvm/start_info.h

Note that unlike the Linux Boot Protocol structures, these structures
don't need to be `#[repr(packed)]` as they are garunteed to have the
proper alignment.

Signed-off-by: Joe Richey <[email protected]>
This approch doesn't scale to our other ASM code and it clutters the output.

Signed-off-by: Joe Richey <[email protected]>
Note that this also requires zeroing out %rdi in the Linux Boot
Protocol path, so that the rdi parameter will have a valid repr.

Signed-off-by: Joe Richey <[email protected]>
PVH starts in 32-bit mode, so we have to transition to 64-bit mode
before we can start running Rust code. As we have not yet initialized
the stack, we can only use registers and static memory.

This transition does the following:
  - Sets up page tables to identity map 2 MiB
  - Loads page tables into CR3
  - Sets CR4.PAE, EFER.LME, and CRO.PG
  - Sets up a 64-bit GDT
  - Long Jumps to 64-bit code

We put the GDT in the .rodata section, and we put the 32-bit code in its
own section. This makes it easier to debug and dissassemble the binary.

Signed-off-by: Joe Richey <[email protected]>
This adds information to the ELF binary so that a loader will know
where to start the executable. This now allows the firmware to be
booted via the PVH Boot Protocol.

Signed-off-by: Joe Richey <[email protected]>
@rbradford
Copy link
Member

@josephlr cool to merge this if you're ready?

@josephlr
Copy link
Contributor Author

@josephlr cool to merge this if you're ready?

@rbradford SGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support more VMMs (QEMU, crosvm, XEN)
2 participants