Peter Hopfgartner
2009-May-21 13:13 UTC
[CentOS] Random server reboot after update to CentOS 5.3
Dear ML We upgraded a Dell Poweredge PE 1950 Server the 8th of May. Since then the server rebooted 3 times without external cause (it is located in a server farm with redundant power supply etc.). Looking at the servers monitoring infrastructure with Dell's own OpenManage tools, I get strange errors: [root at servernew ~]# omreport system esmlog (....) Severity : Critical Date and Time : Mon May 11 17:46:59 2009 Description : System Software event: run-time critical stop was asserted Severity : Critical Date and Time : Fri May 15 21:07:57 2009 Description : System Software event: run-time critical stop was asserted Severity : Critical Date and Time : Wed May 20 21:00:53 2009 Description : System Software event: run-time critical stop was asserted (...) This class of errors never happened before in over a year that the server is running. There is no mention of any anomaly, except the boot messages itself, in /var/log/messages. The server runs the 64 bit flavor of CentOS hosting some XEN virtual machines and some PostgreSQL and MySQL databases. It run without any issues with CentOS 5.1 and 5.2. I interpreted these issues as some kernel/software related problem, but do not know how to make a more accurate diagnosis of the problem. Can anybody give me some hint? Has anybody had some similar issue? Regards, Peter -- Dott. Peter Hopfgartner R3 GIS Srl - GmbH Via Johann Kravogl-Str. 2 I-39012 Meran/Merano (BZ) Email: peter.hopfgartner at r3-gis.com Tel. : +39 0473 494949 Fax : +39 0473 069902 www : http://www.r3-gis.com
On Thu, 2009-05-21 at 15:13 +0200, Peter Hopfgartner wrote:> Dear ML > > We upgraded a Dell Poweredge PE 1950 Server the 8th of May. Since then > the server rebooted 3 times without external cause (it is located in a > server farm with redundant power supply etc.). Looking at the servers > monitoring infrastructure with Dell's own OpenManage tools, I get > strange errors: > > [root at servernew ~]# omreport system esmlog > > (....) > > Severity : Critical > Date and Time : Mon May 11 17:46:59 2009 > Description : System Software event: run-time critical stop was asserted > > Severity : Critical > Date and Time : Fri May 15 21:07:57 2009 > Description : System Software event: run-time critical stop was asserted > > Severity : Critical > Date and Time : Wed May 20 21:00:53 2009 > Description : System Software event: run-time critical stop was asserted > > > (...) > > This class of errors never happened before in over a year that the > server is running. > > There is no mention of any anomaly, except the boot messages itself, in > /var/log/messages. > > The server runs the 64 bit flavor of CentOS hosting some XEN virtual > machines and some PostgreSQL and MySQL databases. It run without any > issues with CentOS 5.1 and 5.2. > > I interpreted these issues as some kernel/software related problem, but > do not know how to make a more accurate diagnosis of the problem. > > Can anybody give me some hint? Has anybody had some similar issue?Hmm... you *definitely* want to take this one to the Dell Linux list. Having said that, I did some googling for: omreport run-time critical stop was asserted and found only one hit for someone that faced it in April 2007. And Dell told them that it may have been software. I'd start there. Some additional questions: What version of CentOS? What kernel version? What version of the Dell tools? -I
On Friday 22 May 2009, Peter Hopfgartner wrote: ...>> Would it make sense to install the kernel from CentOS 5.2? Any >> contraindications?>As others have said, you should still have the 5.2 kernel around. Just change >the grub.conf and reboot. It makes no sense to start swapping around hardware >until you've tried to revert the kernel.>That said, we've seen hangs and strange kernel messages on several different >server platforms (HP DL140g3: NMI-related messages logged, HP DL160g5: hangs >semi-randomly) with the new 5.3 kernels. All of these problems could be >worked around by booting with the kernel option "nmi_watchdog=0".>/PeterI am experiencing the same issue with random reboots after a 5.3 upgrade. Sometimes it will go for days without rebooting then today it has rebooted 6 times at random times. I have modified grub.conf to go back to 2.6.18-92.1.22.el5xen on my dom0 and my only domU so we will see what happens (or hopefully doesn't happen) the next few days. I have a 3.0 P4 CPU with HT that does not support 64-bit so it's running an i686 kernel. It does have a Broadcom NIC like an earlier post was suspicious of: 02:08.0 Ethernet controller: Intel Corporation 82562EZ 10/100 Ethernet Controller (rev 02) 02:09.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703X Gigabit Ethernet (rev 02) I have upgraded about 8 other servers with no random reboot problems but they are all running on a newer processors with a 64-bit kernel. -- Dave Jones
Peter Hopfgartner
2009-Jun-03 09:27 UTC
[CentOS] Random server reboot after update to CentOS 5.3
Epilogue: I've tried to disable TSO (ethtool -K eth0 tso off), as was suggested on the poweredge list. This did not help. I've configured the machine to start with the 5.2 kernel in /boot/grub/grub.conf, changing the default. It has been running for 6 1/2 days, now. I would say that this helped and is what I would suggest to others experiencing the same problem, right now. Thus, current running kernel is 2.6.18-92.1.10.el5xen. Regards and thanks for all replies, Peter Peter Hopfgartner wrote:> Dear ML > > We upgraded a Dell Poweredge PE 1950 Server the 8th of May. Since then > the server rebooted 3 times without external cause (it is located in a > server farm with redundant power supply etc.). Looking at the servers > monitoring infrastructure with Dell's own OpenManage tools, I get > strange errors: > > [root at servernew ~]# omreport system esmlog > > (....) > > Severity : Critical > Date and Time : Mon May 11 17:46:59 2009 > Description : System Software event: run-time critical stop was asserted > > Severity : Critical > Date and Time : Fri May 15 21:07:57 2009 > Description : System Software event: run-time critical stop was asserted > > Severity : Critical > Date and Time : Wed May 20 21:00:53 2009 > Description : System Software event: run-time critical stop was asserted > > > (...) > > This class of errors never happened before in over a year that the > server is running. > > There is no mention of any anomaly, except the boot messages itself, in > /var/log/messages. > > The server runs the 64 bit flavor of CentOS hosting some XEN virtual > machines and some PostgreSQL and MySQL databases. It run without any > issues with CentOS 5.1 and 5.2. > > I interpreted these issues as some kernel/software related problem, but > do not know how to make a more accurate diagnosis of the problem. > > Can anybody give me some hint? Has anybody had some similar issue? > > Regards, > > Peter > >-- Dott. Peter Hopfgartner R3 GIS Srl - GmbH Via Johann Kravogl-Str. 2 I-39012 Meran/Merano (BZ) Email: peter.hopfgartner at r3-gis.com Tel. : +39 0473 494949 Fax : +39 0473 069902 www : http://www.r3-gis.com