RFC/WIP: reboot with shutdown/start not reboot #193

jandryuk · 2022-04-28T14:48:27Z

This is only partially complete. It's also on top of my Xen 4.14/4.16 work, so I included 384380a to avoid conflicts and let the other two patches apply.

Today we issue xl reboot and use a signal to synchronize between xenmgr and xl. This is needed since xenmgr issues the viptables commands which are needed for xl to talk QMP to stubdom's qemu for a successfully boot. Actually with vchan-socket-proxy providing QMP proxying, and not using argo, that particular issue is gone. But stubom still needs argo rules to connect back to helpers.

The idea here is our VMs can use on_reboot='destroy' in their config files to shutdown when rebooted. With a patch to libxl, xenmgr can notice this and start the VM again for a reboot (and leave halted for a shutdown). xl needs patching to extend the xenstore /state/$uuid/state tracking for a reboot, but then the signalling patch can be removed.

VM initiated/internal reboot works.

The external xec-vm -n VM reboot does not work. While conceptually simple, my haskell is not strong enough. Vm/Actions.hs:rebootVm is under/returns an Rpc () monad, but startVm/restartVm return XM () monads and I don't know how to mash those together.

Posting RFC to see if there is interest in pursuing this idea further. Help on the monad mismatch would be appreciated.

Set TAPDISK3_CRYPTO_KEYDIR in the environment when launching xl. It will remain set through the block script and into tap-ctl where it is used. The other approach was to have the block-tap script use xec to call back into xenmgr. That could work. The VM case is easy. The stubdom case is more complicated since you'd have to map from the stubdom's domid back to the VM to get the variables. Since xenmgr has the info, just pass it down. This also takes care of the stubdom case since the same xl process is exec-ing the hotplug scripts. Help-from: Chris Rogers <[email protected]> Signed-off-by: Jason Andryuk <[email protected]>

Avoid the need for the reboot signal dance by having xl clean up a domain on shutdown. XenMgr can then restart with just xl start. Signed-off-by: Jason Andryuk <[email protected]>

Expand Xl.start to start the domain from state Rebooted in addition to Shutdown. This allows the restartVm path, triggered by a domain-initiated reboot, to re-start the Vm. Otherwise state Rebooted is ignored and Xl.start does nothing. This refactors the bulk of the code into a _start helper to be called from the two cases. Signed-off-by: Jason Andryuk <[email protected]>

jandryuk · 2022-05-16T13:10:43Z

I have a xec-vm -n VM reboot implementation now which I'll push up soon.

Replace the use of `xl reboot` by `xl shutdown` and `xl start` to implement the vm reboot DBus command. xl shutdown needs on_reboot="destroy" in the xl.cfg file. This allows us to remove the xenmgr <-> xl signal synchronization. Xl.shutdown & RpcAgent shutdowns both wait for the VM to shutdown, except the `xl shutdown -c` hypercall version which seems to terminate right quick. The waitForState call is therefore not really implementing a timeout out. The corner case is if the VM has ACPI & no PV drivers, but is configured not to react to the power button. In that case we can't do much. If the VM then internally reboots, the xenmgr React code will reboot the VM and the waitForState will eventually timeout. writeXlConfig is removed as rebootVm -> restartVm -> _startVm startVmInternal -> bootVm calls writeXlConfig. Signed-off-by: Jason Andryuk <[email protected]>

There is no meaningful difference between restartVm and startVm, so remove restartVm and the unused is_reboot boolean. Signed-off-by: Jason Andryuk <[email protected]>

This code is no longer called, so remove it. Signed-off-by: Jason Andryuk <[email protected]>

Sometimes the shutdown command does not work. It seems like the longer the VM is up, the more likely it is to not shutdown. This is based on Windows 10 without PV drivers. The button press is seen by the VM to some extent because a DPMS off (black) screen will turn back on, but shutdown is not initiated. A second press will trigger it - that was the intent to the xl trigger power line added in commit 1cfb6aa "xenmgr: Add xl trigger power to HVM shutdown". Re-work the code so a background thread is started that will push the power buttons* 3 times each with a 1 second delay. This will hopefully let the VM recognize the button press without going on for too long. *HVMs have two power and two sleep buttons. One is Xen emulating buttons for HVMs, and the second is from QEMU's acpi-pm-features.patch and ACPI changes. xl trigger power is pushing the Xen one and the xenstore hvm-shutdown write is triggering the QEMU one. With forkIO, the `xl shutdown -F -ww` runs before the `xl trigger power` commands. xl shutdown first tries PV shutdown and then (-F) fallback to ACPI. So this would push the Xen ACPI button before the QEMU one in pushPowerButton. Signed-off-by: Jason Andryuk <[email protected]>

jandryuk · 2022-05-16T15:29:10Z

Force pushed to remove use of system_ which is added by #194

jandryuk added 3 commits April 27, 2022 15:51

xenmgr: Use xl.cfg on_reboot='destroy'

7f1b5e1

Avoid the need for the reboot signal dance by having xl clean up a domain on shutdown. XenMgr can then restart with just xl start. Signed-off-by: Jason Andryuk <[email protected]>

jandryuk added 4 commits May 16, 2022 11:14

xenmgr: Remove restartVm and reboot code

9b6d13f

There is no meaningful difference between restartVm and startVm, so remove restartVm and the unused is_reboot boolean. Signed-off-by: Jason Andryuk <[email protected]>

xenmgr: Remove xl reboot code

6419e6a

This code is no longer called, so remove it. Signed-off-by: Jason Andryuk <[email protected]>

jandryuk force-pushed the wip-on-reboot branch from 66ec4a7 to edd9f26 Compare May 16, 2022 15:27

jandryuk mentioned this pull request Nov 7, 2022

xenmgr: Push ACPI power button multiple times #200

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC/WIP: reboot with shutdown/start not reboot #193

RFC/WIP: reboot with shutdown/start not reboot #193

jandryuk commented Apr 28, 2022

jandryuk commented May 16, 2022

jandryuk commented May 16, 2022

RFC/WIP: reboot with shutdown/start not reboot #193

Are you sure you want to change the base?

RFC/WIP: reboot with shutdown/start not reboot #193

Conversation

jandryuk commented Apr 28, 2022

jandryuk commented May 16, 2022

jandryuk commented May 16, 2022