Hi, I updated a server yesterday from "kernel 2.6.18-128.7.1.el5xen" to "kernel 2.6.18-164.el5xen" After rebooting, my message log is flooded every second or so with this error messages: Oct 6 14:52:20 xenserver1 kernel: EDAC MC0: UE row 0, channel-a= 0 channel-b= 1 labels "-": NON-FATAL recoverable (Branch=0 DRAM-Bank=0 Buffer ID = 0 RDWR=Read RAS=0 CAS=0 NON-FATAL recovera ble Err=0x2000 (FB-DIMM Configuration Write error on first attempt)) and Oct 6 15:17:23 xenserver1 kernel: EDAC MC0: CE row 0, channel 0, label "": Corrected error (Branch=0 DRAM-Bank=0 RDWR=Read RAS=0 CAS=0, CE Err=0x10000 (Correctable Non-Mirrored Demand Data E CC)) The machine is a new Tyan S5397 mobo with 16GB Kingston RAM KVR667D2D4F5K2/8G Removing and replacing memory to different locations doesn't make any difference. After some digging, I noticed that the new kernel has added support for the i5400 chipset. I found some reference that the new kernel has this error reporting capability the old one hadn't. Question1: how many recoverable RAM errors are acceptable? Question2: The error appears always with the same id in the error message. Mobo problem? Question3: Are there any recommended BIOS settings to operate the RAM slower to see if the problem disappears? Question4: Any other proposals. Being located in Germany makes the "just return it to the dealer" proposal quite unattractive. best regards --- Michael Schumacher PAMAS Partikelmess- und Analysesysteme GmbH Dieselstr.10, D-71277 Rutesheim Tel +49-7152-99630 Fax +49-7152-996333 Gesch?ftsf?hrer: Gerhard Schreck Handelsregister B Stuttgart HRB 252024
Run a memtest instead. If it fails, simply replace it. On Tue, Oct 6, 2009 at 9:28 PM, Michael Schumacher < michael.schumacher at pamas.de> wrote:> Hi, > > I updated a server yesterday from > > "kernel 2.6.18-128.7.1.el5xen" to "kernel 2.6.18-164.el5xen" > > After rebooting, my message log is flooded every second or so with this > error messages: > > Oct 6 14:52:20 xenserver1 kernel: EDAC MC0: UE row 0, channel-a= 0 > channel-b= 1 labels "-": NON-FATAL recoverable (Branch=0 DRAM-Bank=0 Buffer > ID = 0 RDWR=Read RAS=0 CAS=0 NON-FATAL recovera > ble Err=0x2000 (FB-DIMM Configuration Write error on first attempt)) > > and > > Oct 6 15:17:23 xenserver1 kernel: EDAC MC0: CE row 0, channel 0, label "": > Corrected error (Branch=0 DRAM-Bank=0 RDWR=Read RAS=0 CAS=0, CE Err=0x10000 > (Correctable Non-Mirrored Demand Data E > CC)) > > The machine is a new Tyan S5397 mobo with 16GB Kingston RAM > KVR667D2D4F5K2/8G > > Removing and replacing memory to different locations doesn't make any > difference. > > After some digging, I noticed that the new kernel has added support > for the i5400 chipset. I found some reference that the new kernel has > this error reporting capability the old one hadn't. > > Question1: how many recoverable RAM errors are acceptable? > Question2: The error appears always with the same id in the error > message. Mobo problem? > Question3: Are there any recommended BIOS settings to operate the RAM > slower to see if the problem disappears? > Question4: Any other proposals. > > Being located in Germany makes the "just return it to the dealer" > proposal quite unattractive. > > > best regards > --- > Michael Schumacher > PAMAS Partikelmess- und Analysesysteme GmbH > Dieselstr.10, D-71277 Rutesheim > Tel +49-7152-99630 > Fax +49-7152-996333 > Gesch?ftsf?hrer: Gerhard Schreck > Handelsregister B Stuttgart HRB 252024 > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.centos.org/pipermail/centos/attachments/20091006/6d4d73a8/attachment-0002.html>
On Tuesday 06 October 2009 09:28, Michael Schumacher wrote:> Question1: how many recoverable RAM errors are acceptable?No errors are acceptable.> Being located in Germany makes the "just return it to the dealer" > proposal quite unattractive.I don't understand why you can't return the memory itself, especially since you say this is a new machine. -- Yves Bellefeuille <yan at storm.ca> "Yves Bellefeuille: Eterna malvenkanto en UEA" -- Heroldo Komunikas, n-ro 389
Am Dienstag, den 06.10.2009, 15:28 +0200 schrieb Michael Schumacher:> Hi, > > I updated a server yesterday from > > "kernel 2.6.18-128.7.1.el5xen" to "kernel 2.6.18-164.el5xen" > > After rebooting, my message log is flooded every second or so with this error messages: > > Oct 6 14:52:20 xenserver1 kernel: EDAC MC0: UE row 0, channel-a= 0 channel-b= 1 labels "-": NON-FATAL recoverable (Branch=0 DRAM-Bank=0 Buffer ID = 0 RDWR=Read RAS=0 CAS=0 NON-FATAL recovera > ble Err=0x2000 (FB-DIMM Configuration Write error on first attempt)) > > and > > Oct 6 15:17:23 xenserver1 kernel: EDAC MC0: CE row 0, channel 0, label "": Corrected error (Branch=0 DRAM-Bank=0 RDWR=Read RAS=0 CAS=0, CE Err=0x10000 (Correctable Non-Mirrored Demand Data E > CC)) > > The machine is a new Tyan S5397 mobo with 16GB Kingston RAM KVR667D2D4F5K2/8GSimply open an RMA at Kingston they will send you replacement memory. Chris financial.com AG Munich head office/Hauptsitz M?nchen: Maria-Probst-Str. 19 | 80939 M?nchen | Germany Frankfurt branch office/Niederlassung Frankfurt: Messeturm | Friedrich-Ebert-Anlage 49 | 60327 Frankfurt | Germany Management board/Vorstand: Dr. Steffen Boehnert | Dr. Alexis Eisenhofer | Dr. Yann Samson | Matthias Wiederwach Supervisory board/Aufsichtsrat: Dr. Dr. Ernst zur Linden (chairman/Vorsitzender) Register court/Handelsregister: Munich ? HRB 128 972 | Sales tax ID number/St.Nr.: DE205 370 553
Hi everybody, thanks for your immediate response. I will replace the board, but I am wondering what the error message actually means?> Oct 16 14:07:36 xenserver1 kernel: EDAC MC0: UE row 0, channel-a= 0 > channel-b= 1 labels "-": NON-FATAL recoverable (Branch=0 DRAM-Bank=0 > Buffer ID = 0 RDWR=Read RAS=0 CAS=0 NON-FATAL recoverable Err=0x2000 > (FB-DIMM Configuration Write error on first attempt))I understand that the system logs an error if the configuration data is written into the RAM-configuration. The error happens precisely once a second. Why the * would the kernel reprogram the RAM configuration once every second? best regards Michael Schumacher mailto:michael.schumacher at pamas.de