Hey, folks, I've just started seeing Apr 12 13:09:59 <server> kernel: [Hardware Error]: MC4_STATUS[Over|CE|MiscV|-|AddrV|-|Poison|CECC]: 0xdd0accf2001d011b Apr 12 13:09:59 <server> kernel: [Hardware Error]: Northbridge Error (node 1, core 1): ECC error in L3 cache tag. Apr 12 13:09:59 <server> kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD Apr 12 13:09:59 <server> kernel: [Hardware Error]: Machine check events logged I'm guessing, unhappily, that this means the on-chip cache, not DIMM. Does anyone here know? mark, who'll be the one to call it in under warranty....
On Thursday 12 April 2012 13.36.03 m.roth at 5-cent.us wrote:> Hey, folks, > > I've just started seeing > Apr 12 13:09:59 <server> kernel: [Hardware Error]: > MC4_STATUS[Over|CE|MiscV|-|AddrV|-|Poison|CECC]: 0xdd0accf2001d011b > Apr 12 13:09:59 <server> kernel: [Hardware Error]: Northbridge Error (node > 1, core 1): ECC error in L3 cache tag.The error message certainly points to the CPU. The fact that the error happened on cache tag, not cache data further implicates the CPU. The message is quite specific and I'd say rather trustworthy... But there's also the possibility that the message is wrong (either something else went wrong or nothing really went wrong). In my experience hardware fault error messages are quite unreliable and at the end of the day DIMMs are magnitudes more likely to fail than CPUs... /Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: <http://lists.centos.org/pipermail/centos/attachments/20120413/8da75ca9/attachment-0003.sig>