Derek Atkins
2020-Feb-10 15:18 UTC
ANNOUNCE: [MAINT] Unexpected reboot/downtime of code (gnucash server)
Hi, TL;DR: The Ovirt VM system rebooted last night but the VMs didn't come back up. They are now back up and running normally (and the cause of the lack of restart has been corrected). Long Version: Some of you may have noticed that code was unavailable for the past 12 hours. Apparently the ovirt host rebooted last night around 10:45pm local time and the script to start the VMs on reboot didn't work. I've spent the past 3 hours debugging and determined the problem with the script was that the ovirt engine reports invalid state immediately upon reboot. Specifically, it reports that the storage domains are "up" even when they are not. It corrects itself shortly, but the startup script sees the storage as "up" and then tries to start the VMs (which fail to start). This has been fixed by adding a short delay between when the engine reports as "up" and when the script starts testing for the storage domains. I know this works because it ran from a clean restart of the ovirt host system. Still of concern is why the machine rebooted last night in the first place. I do not have an answer for that, and the logs don't really show anything of substance. I plan to continue to monitor the situation, and I will add some additional debugging in case it decides to reboot itself yet again. But at least if it does, we know the VMs will come back! :) Sorry for the downtime. -derek -- Derek Atkins 617-623-3745 derek at ihtfp.com www.ihtfp.com Computer and Internet Security Consultant