Fred Clift
2016-Mar-30 16:07 UTC
[libvirt-users] libvirt hooks deadlocks and auto-restarting a stopped guest
All, Short version: I want to auto-restart a particular guest any time it shuts down. libvirt hooks can't call 'virsh start foo'. What is a good/simple way to do this? Long version: I have what is probably an unusual set of requirements. To summarize briefly: I have an OS image that runs 3rd party binaries that I can't modify. These binaries occasionally get they system into a state that can not be resolved by restarting the software. We can detect the error state, but not prevent it or fix either the OS or the software. (The joys of closed-source software... sigh.) For a long time this has been running on real hardware, and a small monitor was written that will initiate a reboot of the system when it gets in that state. I ALSO use a custom qemu command line option: -snapshot so that I throw away all changes each time the system shuts down. These are all part of a compute cluster and no non-transient data is ever on these systems. What I'm trying to do: use snapshots AND on every OS boot, start fresh from the base image. It works if I give two custom options to qemu-kvm (-snapshot -no-reboot). Everytime I shut the system down, and then start it, I get back to the starting image. But if the error happens, I either shut down (with the -no-reboot option) and stay down, or I do a warm-boot and dont discard all filesystem changes. So this looked to me like a place to use libvirt hooks (Centos 7, I had to create /etc/libvirt/hooks and restart the libvirtd service). I made a qemu hook that watches for this dom name to get a release event so that I could auto-restart it. Of course you probably already know what I didn't - you can't call 'virsh start' safely from a libvirt hook. So, how can I make libvirt auto restart a dom? a libvirt hook doesn't seem to be the way. Several options occur: I might make the hook spawn something in the background that sleeps 5 seconds and then does a virsh start - presumably some time after the hook script exits. This seems somewhat problematic, and not reliable. I could make a cron-job that checks every minute and just restarts it. I'd like the restart to be faster than an average of 30 seconds though. I could replace the <emulator>/.....</emulator> in the dom definition with a script that wraps the real emulator in a loop, maybe... I could write an libvirt api consuming application that watches for the right events (reboot and shutdown) and then does the right thing. Other ideas? Fred Clift