Jan Grant
2003-Oct-20 08:40 UTC
Expert input required: P4 odd signals, no apparent memory fault, DISABLE_PSE?
I'm tracking -STABLE on a 1.8GHz P4 with 512MB of memory. Roughly since the PAE changes were MFCed, I've been seeing memory-corruption-related errors under specific circumstances: for example, a run of portsdb -fUu can be guaranteed to generate SIGBUS, SIGILL and SIGSEGVs in a handful of sh, sed, etc. processes. However, reverting to a 4.8 kernel from prior to September either hides/masks these errors, or no longer triggers them. The memory/mobo _appears_ to check out OK under (ferinstance) extended runs of memtest86. Now, on -current I've seen reference to the DISABLE_PSE kernel option, and some discussion that this behaviour may be due to a processor/timing bug. So I have the following questions which I'd appreciate an expert giving a definitive opinion on (I'm no x86/hardware hacker, me): - are these problems potentially caused by this bug? - what exactly does DISABLE_PSE do? (it's undocumented and a one-para explanation of the expected behaviour of this option would be appreciated) - were any commits around the time of the MFC of the PAE code liable to have introduced problems into the kernel which this workaround might address? I know it's a lot to ask, but both hardware and OS have been rock-solid up until this point. Although I've not conclusively ruled out hardware faults, the continued stability under high load of a pre-september 4.8 kernel makes me suspicious that this is more likely to be a bug getting tickled than I'd normally suspect from these symptoms. I'm about to experiment with this option but it currently feels a little like cargo-cult admin. If there are any definitive tests that would indicate if this hardware problem is present and addressed by this, that's be nice to know too. Cheers, jan -- jan grant, ILRT, University of Bristol. http://www.ilrt.bris.ac.uk/ Tel +44(0)117 9287088 Fax +44 (0)117 9287112 http://ioctl.org/jan/ "No generalised law is without exception." A self-demonstrating axiom.
Mike Tancsa
2003-Oct-20 09:11 UTC
Expert input required: P4 odd signals, no apparent memory fault, DISABLE_PSE?
How recent is your copy of RELENG_4 ? The PSE disable code was committed to the tree already as well as a fix so it would work with APM on the 17th. By default it is disabled. If you look at your dmesg.boot you should see Warning: Pentium 4 CPU: PSE disabled ---Mike At 11:38 AM 20/10/2003, Jan Grant wrote:>I'm tracking -STABLE on a 1.8GHz P4 with 512MB of memory. Roughly since >the PAE changes were MFCed, I've been seeing memory-corruption-related >errors under specific circumstances: for example, a run of > portsdb -fUu >can be guaranteed to generate SIGBUS, SIGILL and SIGSEGVs in a handful >of sh, sed, etc. processes.