Not sure if this is the correct subject line but my recently installed Centos build (Linux localhost.localdomain 3.10.0-229.14.1.el7.x86_64 #1 SMP Tue Sep 15 15:05:51 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux) periodically just freezes - completely locks up, no activity, nothing in the logs, just stops dead requiring a power off and reboot. I've really looked around to try and find the _best_ way to set up debugging but there is a lot written about it from a lot of parties but I'm not sure who the definitive source is. I did try booting with the 'CentOS Linux (3.10.0-229.14.1.el7.x86_64) 7 (Core) with debugging' option but that really didn't add anything to finding a solution. Dmesg did report this however: dmesg|grep debug [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-229.14.1.el7.x86_64 root=UUID=1928d2da-784c-4b18-868c-f9858bceea6d ro crashkernel=auto rhgb quiet LANG=en_US.UTF-8 systemd.debug [ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.10.0-229.14.1.el7.x86_64 root=UUID=1928d2da-784c-4b18-868c-f9858bceea6d ro crashkernel=auto rhgb quiet LANG=en_US.UTF-8 systemd.debug [ 0.940176] ehci-pci 0000:00:12.2: debug port 1 [ 0.946472] ehci-pci 0000:00:13.2: debug port 1 [ 1.238335] systemd[1]: Unknown kernel switch systemd.debug. Ignoring. [ 5.083981] SELinux: initialized (dev debugfs, type debugfs), uses genfs_contexts [ 5.206423] systemd[1]: Unknown kernel switch systemd.debug. Ignoring. Which I found interesting. So, can anyone point me to a detailed guide to debugging what's going on with my build that will help me solve my locking problem? BTW, I think it might be my wireless Atheros equipped PCI card (dmesg|grep -i atheros [ 2.231264] ath5k: phy0: Atheros AR2414 chip found (MAC: 0x79, PHY: 0x45) but can't be sure because I've not any real proof that's the issue. Thanks in advance for your patience and assistance.
On Fri, Oct 16, 2015 at 7:33 AM, Tod <listacctc at gmail.com> wrote:> Not sure if this is the correct subject line but my recently installed > Centos build (Linux localhost.localdomain 3.10.0-229.14.1.el7.x86_64 #1 SMP > Tue Sep 15 15:05:51 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux) periodically > just freezes - completely locks up, no activity, nothing in the logs, just > stops dead requiring a power off and reboot."nothing in the logs"? Have you run memtest for an extended period of time? You might first want to eliminate the possibility that this is a hardware problem. Akemi
If you have hardware raid on this machine, try to mount xfs partitions with nobarrier. We had similar freezes and this helped for us. On Fri, Oct 16, 2015 at 9:04 PM, Akemi Yagi <amyagi at gmail.com> wrote:> On Fri, Oct 16, 2015 at 7:33 AM, Tod <listacctc at gmail.com> wrote: > > Not sure if this is the correct subject line but my recently installed > > Centos build (Linux localhost.localdomain 3.10.0-229.14.1.el7.x86_64 #1 > SMP > > Tue Sep 15 15:05:51 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux) periodically > > just freezes - completely locks up, no activity, nothing in the logs, > just > > stops dead requiring a power off and reboot. > > "nothing in the logs"? Have you run memtest for an extended period of > time? You might first want to eliminate the possibility that this is a > hardware problem. > > Akemi > _______________________________________________ > CentOS mailing list > CentOS at centos.org > https://lists.centos.org/mailman/listinfo/centos >-- Marius Vaitiek?nas
Akemi Yagi wrote:> On Fri, Oct 16, 2015 at 7:33 AM, Tod <listacctc at gmail.com> wrote: >> Not sure if this is the correct subject line but my recently installed >> Centos build (Linux localhost.localdomain 3.10.0-229.14.1.el7.x86_64 #1 >> SMP >> Tue Sep 15 15:05:51 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux) >> periodically just freezes - completely locks up, no activity, >> nothing in the logs, just stops dead requiring a power off and reboot. > > "nothing in the logs"? Have you run memtest for an extended period of > time? You might first want to eliminate the possibility that this is a > hardware problem.Actually, we've had that occasionally, on a number of boxes. I *think* they were all SuperMicros (sold by Penguin), and they become unresponsive - when I plug in the monitor-on-a-stick, there's no response at all on the console, keys do nothing. we have to power cycle them, and nothing ever shows, not in dmesg.old, not messages, nowhere. Never figured it out. mark