Jeff
2008-Oct-13 13:38 UTC
[CentOS] "EDAC i5000 MC0: FATAL ERRORS Found!!!" error message?
Hi List, We had the following error thrown on console on a PowerEdge server running CentOS 5 (64 bit). Googling around didn't yield any particular insights. The server crashed a few minutes after this message. Running memtester, just to check, didn't find anything; and the box has been running for months before this without issue. I'm wondering if anyone has run across this before, and if so, if it was software (CentOS) or hardware (PowerEdge / PowerVault) related? Oct 8 12:19:35 someServer kernel: EDAC i5000 MC0: FATAL ERRORS Found!!! 1st FATAL Err Reg= 0x4 Oct 8 12:19:35 someServer kernel: EDAC i5000 MC0: >Tmid Thermal event with intelligent throttling disabled Oct 8 12:19:35 someServer kernel: EDAC MC0: UE row 1, channel-a= 2 channel-b= 3 labels "-": (Branch=1 DRAM-Bank=0 RDWR=Write RAS=11802 CAS=0 FATAL Err=0x4) Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.centos.org/pipermail/centos/attachments/20081013/7479083b/attachment-0003.html>
Tim Verhoeven
2008-Oct-13 13:45 UTC
[CentOS] "EDAC i5000 MC0: FATAL ERRORS Found!!!" error message?
On Mon, Oct 13, 2008 at 3:38 PM, Jeff <jpotter-centos at codepuppy.com> wrote:> > We had the following error thrown on console on a PowerEdge server running > CentOS 5 (64 bit). Googling around didn't yield any particular insights. The > server crashed a few minutes after this message. Running memtester, just to > check, didn't find anything; and the box has been running for months before > this without issue. > I'm wondering if anyone has run across this before, and if so, if it was > software (CentOS) or hardware (PowerEdge / PowerVault) related? > Oct 8 12:19:35 someServer kernel: EDAC i5000 MC0: FATAL ERRORS Found!!! 1st > FATAL Err Reg= 0x4 > Oct 8 12:19:35 someServer kernel: EDAC i5000 MC0: >Tmid Thermal event with > intelligent throttling disabled > Oct 8 12:19:35 someServer kernel: EDAC MC0: UE row 1, channel-a= 2 > channel-b= 3 labels "-": (Branch=1 DRAM-Bank=0 RDWR=Write RAS=11802 CAS=0 > FATAL Err=0x4)IIRC the EDAC i5000 is the memory controller of the server, and it looks like something went wrong with a DIMM and that is probably why it crashed. So it looks like you may have a (intermittent) hardware issue. Regards, Tim -- Tim Verhoeven - tim.verhoeven.be at gmail.com - 0479 / 88 11 83 Hoping the problem magically goes away by ignoring it is the "microsoft approach to programming" and should never be allowed. (Linus Torvalds)