David Wolfskill
2006-Jun-15 23:22 UTC
Help? 6.1-S: Fatal trap 12: page fault while in kernel mode
I had one of these a couple of weeks ago or so; I had been distracted by some more urgent matters that came up (the panic was on a machine under test; the more urgent matters were little things like needing to deploy a handful of resolvers on our network because existing ones were running on systems that had provided evidence of being prone to imminent failure). Anyway: I updated the 2 boxen under test to 6.1-STABLE as of this morning, and finally(!) had a chance to re-try the failing operation. It went "kaboom!" again. :-{ (Well, there's something to be said for consistency. :-}) The setup is thus: * On machine "C", I run smtp-sink (one of the test programs from Postfix). * On machine "B" (the machine & software under test), I fire up the software being tested, which acts as an SMTP relay, accepting mail and relaying it to machine C (where it gets counted and discarded). * On machine "A", I have installed the mail/postal port; I run "postal," directing it to send mail to the SMTP server on machine B (the machine under test). It seems to run OK (albeit slowly) for a couple of minutes; then the serial console reports: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 06 fault virtual address = 0x0 fault code = supervisor read, page not present instruction pointer = 0x20:0x0 stack pointer = 0x28:0xf09b3b98 frame pointer = 0x28:0xf09b3bcc code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 23782 (ecelerity) [thread pid 23782 tid 100120 ] Stopped at 0: *** error reading from address 0 *** db> trace Tracing pid 23782 tid 100120 td 0xcc445180 db> Now, the software being tested apparently exercises threads quite a bit. The hardware (for machine B) is a dual Xeon @ 3 GHz & 4 GB RAM. The kernel config is pretty simple: -------------%< snip! ------------------- include PAE options SMP # Symmetric MultiProcessor Kernel nodevice hptmv nodevice bce options MAXDSIZ="(2000UL*1024*1024)" options KDB options KDB_TRACE options DDB options IPFIREWALL options IPFIREWALL_VERBOSE #enable logging to syslogd(8) options IPFIREWALL_VERBOSE_LIMIT=0 #do not limit verbosity options DUMMYNET options IPDIVERT -------------%< snip! ------------------- So: I have a pair of these machines, configured identically. Each is connected to a terminal server for access to the serial console. I have a private mirror of the FreeBSD CVS repository; I'm tracking RELENG_6 & HEAD on my laptop daily; I could try building CURRENT on one of these boxen if it would help get the problem solved. The software under test was built for FreeBSD 5.x; I have the misc/compat5x port installed. The vendor claims that they don't have this kind of problem with "Linux," and if I can't get it to run without letting the magic smoke leak out, I'll probably end up trying to hack my way through installing some flavor of Linux on one of the machines, which prospect I find remarkably unappealing. Maybe the DTrace stuff would help? Could someone please work with me on this, so we can have a software vendor recommending that their customers deploy their software on FreeBSD, rather than recommending against it? Thanks! Peace, david -- David H. Wolfskill david@catwhisker.org Doing business with spammers only encourages them. Please boycott spammers. See http://www.catwhisker.org/~david/publickey.gpg for my public key. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060615/858e0a36/attachment.pgp
Lord Reaper
2006-Jun-16 18:42 UTC
Help? 6.1-S: Fatal trap 12: page fault while in kernel mode
On Thu, 15 Jun 2006 16:22:40 -0700 David Wolfskill <david@catwhisker.org> wrote:>Fatal trap 12: page fault while in kernel mode >cpuid = 0; apic id = 06 >fault virtual address = 0x0 >fault code = supervisor read, page not present >instruction pointer = 0x20:0x0 >stack pointer = 0x28:0xf09b3b98 >frame pointer = 0x28:0xf09b3bcc >code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 >processor eflags = interrupt enabled, resume, IOPL = 0 >current process = 23782 (ecelerity) >[thread pid 23782 tid 100120 ] >Stopped at 0: *** error reading from address 0 *** >I had similar problems when updating from 5.4 to 6.1 because of nvidia-driver. After changing the card, the system worked like a charm. Later on recompiling nvidia-driver (forgot to deinstall it) resulted in the machine crashing and rebooting itself. This happened with a non-nvidia graphic adapter installed. I remember hearing that optimizations might be the cause of the driver failing. Hope this helps. Regards, Sampsa Suoninen
Gavin Atkinson
2006-Jun-23 15:29 UTC
Help? 6.1-S: Fatal trap 12: page fault while in kernel mode
On Thu, 2006-06-15 at 16:22 -0700, David Wolfskill wrote:> I had one of these a couple of weeks ago or so; I had been distracted by > some more urgent matters that came up (the panic was on a machine under > test; the more urgent matters were little things like needing to deploy > a handful of resolvers on our network because existing ones were running > on systems that had provided evidence of being prone to imminent > failure). > > Anyway: I updated the 2 boxen under test to 6.1-STABLE as of this > morning, and finally(!) had a chance to re-try the failing operation. > > It went "kaboom!" again. :-{ (Well, there's something to be said for > consistency. :-}) > > It seems to run OK (albeit slowly) for a couple of minutes; then the > serial console reports: > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 06 > fault virtual address = 0x0 > fault code = supervisor read, page not present > instruction pointer = 0x20:0x0 > stack pointer = 0x28:0xf09b3b98 > frame pointer = 0x28:0xf09b3bcc > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 23782 (ecelerity) > [thread pid 23782 tid 100120 ] > Stopped at 0: *** error reading from address 0 *** > db> trace > Tracing pid 23782 tid 100120 td 0xcc445180 > db>OK, seeing as nobody has offered any advice, I'll have a go. Have you got a debug kernel? If so, get a kernel dump. Load it into kgdb. Chances are "bt" won't work as the instruction pointer is zero, so instead you need to display the stack directly: (kgdb) x/80xw 0xf09b3b98 Look for any addresses in the 0xc0xxxxxx range - these will probably be pointers to kernel functions. Drop out of kgdb, and try to find out which functions these belong to: addr2line 0xc0639bd6 -e kernel.debug /usr/src/sys/kern/tty.c:1653 You can build up a backtrace and knowledge of atguments given to functions this way. Gavin