Marko Vojinovic
2019-Jan-01 11:47 UTC
[CentOS] How to troubleshoot partial shutdown problem?
Hi folks, I've never encountered the following problem before (but I guess there is a first time for everything) --- after issuing a regular shutdown, the system starts the shutdown procedure, but stalls at some point, and never finishes. It gets to the console, writes the "powering down" message and stops there --- the hardware never actually powers off. At that time, the machine is completely unresponsive to anything, I have to hold the power button for 5 seconds to actually force it to turn off. This is a clean install of C7, fully updated. I doubt that hardware is at fault, since it *does* properly shut down when I boot it from the C7 live usb (which was used to install the system in the first place). The only difference is that the live usb is not updated at all, while the os on the hard drive has received cca 1000 updates after initial installation. So I'm guessing that something in the updates broke something in the shutdown procedure. Btw, rebooting the machine works properly, no issues. How am I to troubleshoot this? Most importantly, what is the best way to check (after the power cycle) if the hard drive had been unmounted properly during the previous shutdown, i.e. if the unmounting finished before the stall? I don't want the hard drive to "suffer" from unclean shutdowns, if possible. Also, what piece of code prints the "powering off" message in the console (since that appears to be the last thing working)? What else to look for, and where? Is this maybe a known issue, is there a fix? Other than shutting down, the machine works completely ok, so arguably this is not such a big problem (as I plan to have it running 24/7), but still, failing to shutdown when I want it to feels somewhat disturbing, I'd rather have that fixed. Any suggestions? Best, :-) Marko P.S. Happy new year to everyone! ;-)
Marko Vojinovic
2019-Jan-01 13:06 UTC
[CentOS] How to troubleshoot partial shutdown problem?
On Tue, 1 Jan 2019 12:47:42 +0100 Marko Vojinovic <vvmarko at gmail.com> wrote:> after issuing a regular shutdown, > the system starts the shutdown procedure, but stalls at some point, > and never finishes. It gets to the console, writes the "powering down" > message and stops there --- the hardware never actually powers off.Just to add another datapoint --- when booted using the old kernel, 3.10.0-123, the shutdown proceeds correctly, while when booted using the latest kernel, 3.10.0-957.1.3, the shutdown fails. I don't have installed any other in-between kernels to test. So this appears to be kernel-related. Any suggestions? TIA, :-) Marko
> On Tue, 1 Jan 2019 12:47:42 +0100 > Marko Vojinovic <vvmarko at gmail.com> wrote: > >> after issuing a regular shutdown, >> the system starts the shutdown procedure, but stalls at some point, >> and never finishes. It gets to the console, writes the "powering down" >> message and stops there --- the hardware never actually powers off. > > Just to add another datapoint --- when booted using the old kernel, > 3.10.0-123, the shutdown proceeds correctly, while when booted using > the latest kernel, 3.10.0-957.1.3, the shutdown fails. I don't have > installed any other in-between kernels to test.I had such problems but they were related to systemd. We're happen to have NFS mounts on our hosts and when shutting down, the network could be disabled before NFS was unmounted. The shutdown will then hang. This happens not always but sometimes and it was very nice to debug. The simple solution was to mount NFS with the option x-systemd.requires=network-online.target so that systemd knows it should unmount it before disabling the network. Regards, Simon
On Tue, 1 Jan 2019, Marko Vojinovic wrote:> Any suggestions?We hit this symptom with some machines due to a USB bug: https://bugzilla.kernel.org/show_bug.cgi?id=66171 kernel arg xhci_hcd.quirks=270336 fixed it for us. Tracking these problems down often ends up being a fairly painful bisect. jh