Derek Atkins
2022-Jan-18 12:42 UTC
ANNOUNCE: [MAINT] Unplanned hardware outage for code.gnucash.org
Good morning, At around 11:30pm US/EST last night, the hardware hosting code.gnucash.org crashed. I noticed the outage this morning at around 6:30 and power-cycled the hardware around 7am. The system rebooted and the VMs (including code) were back in operation by around 7:20am. Looking at the logs host logs, it looks like the host system was still alive but started having "sanlock" issues around 00:33 this morning, then a watchdog error at 00:34, and an ATA error immediately after. At this point I started getting VDSM errors, another ATA error 10 seconds later, and then VDSM execution errors (most likely all due to the ATA issues). About 30 seconds letter the VDSM service exited into "failed state", and the log abruptly ends shortly thereafter at 00:34:55. According to the reboot log, ATA4 is a 2TB SSD. It is unclear if the issue is the drive itself or the ATA driver. At this point, the hardware is up and running, but clearly there is "an issue". I'm getting on a plane in 3h30m so hopefully the system will remain stable until I return Thursday. -derek -- Derek Atkins 617-623-3745 derek at ihtfp.com www.ihtfp.com Computer and Internet Security Consultant