Dario Lesca
2016-May-03 10:15 UTC
[CentOS] Centos 6.7: kernel: EDAC MC0: CE row 2, channel 1, label "": (..... (Correctable Patrol Data ECC))
After update from centos 6.6 to centos 6.7 and reboot it, I have get a lot of this error into /var/log/messages:> May??3 11:27:20 s-virt kernel: EDAC MC0: CE row 2, channel 1, label > "": (Branch=0 DRAM-Bank=2 RDWR=Read RAS=6093 CAS=896, CE Err=0x10000 > (Correctable Patrol Data ECC)) > May??3 11:27:21 s-virt kernel: EDAC MC0: CE row 2, channel 1, label > "": (Branch=0 DRAM-Bank=1 RDWR=Read RAS=1330 CAS=4, CE Err=0x2000 > (Correctable Non-Mirrored Demand Data ECC)) > May??3 11:27:22 s-virt kernel: EDAC MC0: CE row 2, channel 1, label > "": (Branch=0 DRAM-Bank=2 RDWR=Read RAS=2673 CAS=4, CE Err=0x2000 > (Correctable Non-Mirrored Demand Data ECC)) > May??3 11:27:23 s-virt kernel: EDAC MC0: CE row 2, channel 1, label > "": (Branch=0 DRAM-Bank=2 RDWR=Read RAS=1335 CAS=4, CE Err=0x2000 > (Correctable Non-Mirrored Demand Data ECC)) > May??3 11:27:24 s-virt kernel: EDAC MC0: CE row 2, channel 1, label > "": (Branch=0 DRAM-Bank=2 RDWR=Read RAS=1335 CAS=4, CE Err=0x2000 > (Correctable Non-Mirrored Demand Data ECC)) > May??3 11:27:25 s-virt kernel: EDAC MC0: CE row 2, channel 1, label > "": (Branch=0 DRAM-Bank=2 RDWR=Read RAS=240 CAS=4, CE Err=0x2000 > (Correctable Non-Mirrored Demand Data ECC)) > May??3 11:27:26 s-virt kernel: EDAC MC0: CE row 2, channel 1, label > "": (Branch=0 DRAM-Bank=3 RDWR=Read RAS=1796 CAS=900, CE Err=0x10000 > (Correctable Patrol Data ECC)) > May??3 11:27:27 s-virt kernel: EDAC MC0: CE row 2, channel 1, label > "": (Branch=0 DRAM-Bank=3 RDWR=Read RAS=1337 CAS=4, CE Err=0x2000 > (Correctable Non-Mirrored Demand Data ECC)) > May??3 11:27:28 s-virt kernel: EDAC MC0: CE row 2, channel 1, label > "": (Branch=0 DRAM-Bank=3 RDWR=Read RAS=3094 CAS=900, CE Err=0x10000 > (Correctable Patrol Data ECC)) > May??3 11:27:29 s-virt kernel: EDAC MC0: CE row 2, channel 1, label > "": (Branch=0 DRAM-Bank=3 RDWR=Read RAS=240 CAS=6, CE Err=0x2000 > (Correctable Non-Mirrored Demand Data ECC)) > May??3 11:27:30 s-virt kernel: EDAC MC0: CE row 2, channel 1, label > "": (Branch=0 DRAM-Bank=3 RDWR=Read RAS=240 CAS=6, CE Err=0x2000 > (Correctable Non-Mirrored Demand Data ECC))I have "yum install edac-utils -y" and I have get this output:> [root at s-virt ~]# edac-util -v > mc0: 0 Uncorrected Errors with no DIMM info > mc0: 0 Corrected Errors with no DIMM info > mc0: csrow0: 0 Uncorrected Errors > mc0: csrow0: ch0: 0 Corrected Errors > mc0: csrow0: ch1: 0 Corrected Errors > mc0: csrow0: ch2: 0 Corrected Errors > mc0: csrow0: ch3: 0 Corrected Errors > mc0: csrow1: 0 Uncorrected Errors > mc0: csrow1: ch0: 0 Corrected Errors > mc0: csrow1: ch1: 0 Corrected Errors > mc0: csrow1: ch2: 0 Corrected Errors > mc0: csrow1: ch3: 0 Corrected Errors > mc0: csrow2: 0 Uncorrected Errors > mc0: csrow2: ch0: 0 Corrected Errors > mc0: csrow2: ch1: 80384 Corrected Errors > mc0: csrow2: ch2: 0 Corrected Errors > mc0: csrow2: ch3: 0 Corrected Errors > mc0: csrow3: 0 Uncorrected Errors > mc0: csrow3: ch0: 0 Corrected Errors > mc0: csrow3: ch1: 8 Corrected Errors > mc0: csrow3: ch2: 0 Corrected Errors > mc0: csrow3: ch3: 0 Corrected ErrorsThe server is a:> [root at s-virt ~]# lshw > s-virt.dom.it???????? > ????description: Tower Computer > ????product: ProLiant ML370 G5 (433752-421) > ????vendor: HP > ????serial: GBxxxxxxxM > ????width: 64 bits > ????capabilities: smbios-2.4 dmi-2.4 vsyscall64 vsyscall32 > ????configuration: boot=hardware-failure-fw chassis=tower > family=ProLiant sku=433752-421 uuid=34333337-3532-4742-3837- > 35303557534Dwith this RAM installed:> [root at s-virt ~]# dmidecode -t memory|grep Size > ????????Size: 1024 MB > ????????Size: 2048 MB > ????????Size: No Module Installed > ????????Size: No Module Installed > ????????Size: 1024 MB > ????????Size: 2048 MB > ????????Size: No Module Installed > ????????Size: No Module Installed > ????????Size: 1024 MB > ????????Size: 4096 MB > ????????Size: No Module Installed > ????????Size: No Module Installed > ????????Size: 1024 MB > ????????Size: 4096 MB > ????????Size: No Module Installed > ????????Size: No Module InstalledI'm not a hardware guru and I do not know "decrypt" log and output commands messages What is the problem signaled into log? What I must to do ? Many thanks for your help. -- Dario Lesca (inviato dal mio Linux Fedora 23 Workstation)
Dario Lesca
2016-May-03 12:02 UTC
[CentOS] Centos 6.7: kernel: EDAC MC0: CE row 2, channel 1, label "": (..... (Correctable Patrol Data ECC))
Il giorno mar, 03/05/2016 alle 12.15 +0200, Dario Lesca ha scritto:> After update from centos 6.6 to centos 6.7 and reboot it, I have get > a > lot of this error into /var/log/messages: > > > > > May??3 11:27:20 s-virt kernel: EDAC MC0: CE row 2, channel 1, label > > "": (Branch=0 DRAM-Bank=2 RDWR=Read RAS=6093 CAS=896, CE > > Err=0x10000 > > (Correctable Patrol Data ECC)) > ... > > What is the problem signaled into log? > > What I must to do ? > > Many thanks for your help.I have found this suggest:> > > As per logs, you are getting CE (Corrected Error) messages in the > system. Ans you can ignore them, Edit grub.conf and add > mce=dont_log_ce to the kernel line which will stop corrected error > messages to log in file. > > But it always good to run memory check in the system.http://serverfault.com/questions/531110/var-log-messages-showing-lots-of-ce-err-0x2000-even-on-unused-banks-slots Add mce=dont_log_ce to grub.conf imply a reboot. It's possible to stop log message without reboot? Thanks -- Dario Lesca (inviato dal mio Linux Fedora 23 Workstation)
Michael Schumacher
2016-May-03 12:54 UTC
[CentOS] Centos 6.7: kernel: EDAC MC0: CE row 2, channel 1, label "": (..... (Correctable Patrol Data ECC))
Tuesday, May 3, 2016, 12:15:21 PM, you wrote: DL> After update from centos 6.6 to centos 6.7 and reboot it, I have get a DL> lot of this error into /var/log/messages:>> May??3 11:27:20 s-virt kernel: EDAC MC0: CE row 2, channel 1, label >> "": (Branch=0 DRAM-Bank=2 RDWR=Read RAS=6093 CAS=896, CE Err=0x10000 >> (Correctable Patrol Data ECC))Hi Dario, I had a similar case in the past. I had a brand new server that seemed to be running fine. After a kernel update, I gor lot of error messages in /etc/messages. That particular kernel update implemented additional error messages that were related to the motherboard chipset. It appears that new boards with new chipset run fine with existing kernel. As soon as kernel updates incorporate special features of these chipsets, you might get such messages. I returned the board under warranty. The board manufacturer informed me that a memory controller was faulty. The board had the hardware error from the beginning. The new kernel only revealed an already existing problem. You may want to check first if the hardware problem comes from memory or motherboard. best regards --- Michael Schumacher