I've got a production system running CentOS 4 that was rock solid until I upgraded from 2.6.9-55 to 2.6.9-78.0.13 (now running 2.6.9-89.0.11). The system now crashes intermittently after a few weeks. I finally caught the panic message : EDAC MC0: INTERNAL ERROR: channel-b out of range (4 >= 4) Kernel panic - not syncing: MC0: Uncorrected Error Looking at the kernel changelog, I see that EDAC support was added for the Intel 5000 chipset in 2.6.9-68.20.EL which this server runs. I'm trying to determine if this is a potential memory issue, or is this related to some other hardware item. Also considering disabling EDAC in the kernel (is "noedac" a valid option?) as a last resort. I will run memtest86+ on the server as soon as possible to check the memory, just formulating my game plan if it's something else. Thoughts? Chris
Chris Miller wrote:> Thoughts?Check your bios/system event log for any indication that it is logging memory errors? Most modern server class motherboards (past 5 years) do this, though not always reliably. I've also had trouble with memtest86 myself, I prefer to run ctcs: http://sourceforge.net/projects/va-ctcs/ The software is really old and is picky what you build it on, if I recall right I could only get it to build on RHEL/CentOS 4 not 5 (though the binaries work fine on 5). It does a good torture test which in my experience can find problems faster than memtest86(which can take days). nate
Chris,> I've got a production system running CentOS 4 that was rock solid > until I upgraded from 2.6.9-55 to 2.6.9-78.0.13 (now running > 2.6.9-89.0.11). The system now crashes intermittently after a few > weeks. I finally caught the panic message :> EDAC MC0: INTERNAL ERROR: channel-b out of range (4 >= 4) > Kernel panic - not syncing: MC0: Uncorrected Error> Looking at the kernel changelog, I see that EDAC support was added > for the Intel 5000 chipset in 2.6.9-68.20.EL which this server runs.Same issue here with a machine running centos 5.3. The problem began with a kernel update that introduced the 5000 chipset. See the thread "RAM errors after kernel-update" for more details. I couldn't solve the problem yet, but because the machine crashes every two days with this kernel, I had to boot an earlier kernel without chipset support.> I'm trying to determine if this is a potential memory issue, or is > this related to some other hardware item. Also considering disabling > EDAC in the kernel (is "noedac" a valid option?) as a last resort. I > will run memtest86+ on the server as soon as possible to check the > memory, just formulating my game plan if it's something else.Don't use the memtest86+ version that comes with the centos ISO. There is a much newer version available from the authors website. Only the new version identifies the chipset correctly. -- Mit freundlichen Gr??en Michael Schumacher mailto:michael.schumacher at pamas.de
Apparently Analagous Threads
- Centos 6.7: kernel: EDAC MC0: CE row 2, channel 1, label "": (..... (Correctable Patrol Data ECC))
- "EDAC i5000 MC0: FATAL ERRORS Found!!!" error message?
- x86_64 EDAC throwing error
- Bug#810964: [Xen-devel] [BUG] EDAC infomation partially missing
- Intel SE7210TP1-E giving memory errors