From: Uwe Doering [mailto:gemini@geminix.org]> Jonathan Gilpin wrote:
> > I've run memtest (memtest86.com) kindly provided by Don and
> it passed all
> > the tests. I've installed installed a kernel module to test
> for memory
> > errors and found that again no memory errors are found...
> So this means it's
> > either a problem with the CPU's or a geniune bug in the
> kernel. (bugger!)
>
> No, that's unfortunately not what it means. If a memory test
> fails you
> can draw the conclusion that you have bad memory, but this
> doesn't work
> the other way round. If a memory test passes there is still a
> possibility that a memory chip is the culprit since memory
> test software
> cannot find all errors.
>
> Also, there is the chip set on the mainboard that coordinates
> bus access
> etc. for the two CPUs. Mainboard and chip set developers are
> known to
> make errors, too. In this case you would have to swap the entire
> mainboard, possible with one from a different manufacturer.
> I can tell
> you from my own experience that it is really hard to find reliable PC
> hardware these days, in light of ever shorter and faster
> product release
> cycles.
I have several hundred of the motherboard the poster is using,
and it works reliably with MP operation with 4.X.
The memtest86 that i sent him understands the ECC registers
on the e7501 MCH, it should find all correctable and uncorrectable
errors.
--don