David Wolfskill
2006-Jun-15 23:22 UTC
Help? 6.1-S: Fatal trap 12: page fault while in kernel mode
I had one of these a couple of weeks ago or so; I had been distracted by
some more urgent matters that came up (the panic was on a machine under
test; the more urgent matters were little things like needing to deploy
a handful of resolvers on our network because existing ones were running
on systems that had provided evidence of being prone to imminent
failure).
Anyway: I updated the 2 boxen under test to 6.1-STABLE as of this
morning, and finally(!) had a chance to re-try the failing operation.
It went "kaboom!" again. :-{ (Well, there's something to be said
for
consistency. :-})
The setup is thus:
* On machine "C", I run smtp-sink (one of the test programs from
Postfix).
* On machine "B" (the machine & software under test), I fire up
the
software being tested, which acts as an SMTP relay, accepting mail and
relaying it to machine C (where it gets counted and discarded).
* On machine "A", I have installed the mail/postal port; I run
"postal,"
directing it to send mail to the SMTP server on machine B (the machine
under test).
It seems to run OK (albeit slowly) for a couple of minutes; then the
serial console reports:
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 06
fault virtual address = 0x0
fault code = supervisor read, page not present
instruction pointer = 0x20:0x0
stack pointer = 0x28:0xf09b3b98
frame pointer = 0x28:0xf09b3bcc
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 23782 (ecelerity)
[thread pid 23782 tid 100120 ]
Stopped at 0: *** error reading from address 0 ***
db> trace
Tracing pid 23782 tid 100120 td 0xcc445180
db>
Now, the software being tested apparently exercises threads quite a bit.
The hardware (for machine B) is a dual Xeon @ 3 GHz & 4 GB RAM.
The kernel config is pretty simple:
-------------%< snip! -------------------
include PAE
options SMP # Symmetric MultiProcessor Kernel
nodevice hptmv
nodevice bce
options MAXDSIZ="(2000UL*1024*1024)"
options KDB
options KDB_TRACE
options DDB
options IPFIREWALL
options IPFIREWALL_VERBOSE #enable logging to syslogd(8)
options IPFIREWALL_VERBOSE_LIMIT=0 #do not limit verbosity
options DUMMYNET
options IPDIVERT
-------------%< snip! -------------------
So: I have a pair of these machines, configured identically. Each
is connected to a terminal server for access to the serial console. I
have a private mirror of the FreeBSD CVS repository; I'm tracking RELENG_6
& HEAD on my laptop daily; I could try building CURRENT on one of these
boxen if it would help get the problem solved.
The software under test was built for FreeBSD 5.x; I have the
misc/compat5x port installed.
The vendor claims that they don't have this kind of problem with
"Linux,"
and if I can't get it to run without letting the magic smoke leak out,
I'll probably end up trying to hack my way through installing some flavor
of Linux on one of the machines, which prospect I find remarkably
unappealing.
Maybe the DTrace stuff would help?
Could someone please work with me on this, so we can have a software
vendor recommending that their customers deploy their software on
FreeBSD, rather than recommending against it?
Thanks!
Peace,
david
--
David H. Wolfskill david@catwhisker.org
Doing business with spammers only encourages them. Please boycott spammers.
See http://www.catwhisker.org/~david/publickey.gpg for my public key.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060615/858e0a36/attachment.pgp
Lord Reaper
2006-Jun-16 18:42 UTC
Help? 6.1-S: Fatal trap 12: page fault while in kernel mode
On Thu, 15 Jun 2006 16:22:40 -0700 David Wolfskill <david@catwhisker.org> wrote:>Fatal trap 12: page fault while in kernel mode >cpuid = 0; apic id = 06 >fault virtual address = 0x0 >fault code = supervisor read, page not present >instruction pointer = 0x20:0x0 >stack pointer = 0x28:0xf09b3b98 >frame pointer = 0x28:0xf09b3bcc >code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 >processor eflags = interrupt enabled, resume, IOPL = 0 >current process = 23782 (ecelerity) >[thread pid 23782 tid 100120 ] >Stopped at 0: *** error reading from address 0 *** >I had similar problems when updating from 5.4 to 6.1 because of nvidia-driver. After changing the card, the system worked like a charm. Later on recompiling nvidia-driver (forgot to deinstall it) resulted in the machine crashing and rebooting itself. This happened with a non-nvidia graphic adapter installed. I remember hearing that optimizations might be the cause of the driver failing. Hope this helps. Regards, Sampsa Suoninen
Gavin Atkinson
2006-Jun-23 15:29 UTC
Help? 6.1-S: Fatal trap 12: page fault while in kernel mode
On Thu, 2006-06-15 at 16:22 -0700, David Wolfskill wrote:> I had one of these a couple of weeks ago or so; I had been distracted by > some more urgent matters that came up (the panic was on a machine under > test; the more urgent matters were little things like needing to deploy > a handful of resolvers on our network because existing ones were running > on systems that had provided evidence of being prone to imminent > failure). > > Anyway: I updated the 2 boxen under test to 6.1-STABLE as of this > morning, and finally(!) had a chance to re-try the failing operation. > > It went "kaboom!" again. :-{ (Well, there's something to be said for > consistency. :-}) > > It seems to run OK (albeit slowly) for a couple of minutes; then the > serial console reports: > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 06 > fault virtual address = 0x0 > fault code = supervisor read, page not present > instruction pointer = 0x20:0x0 > stack pointer = 0x28:0xf09b3b98 > frame pointer = 0x28:0xf09b3bcc > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 23782 (ecelerity) > [thread pid 23782 tid 100120 ] > Stopped at 0: *** error reading from address 0 *** > db> trace > Tracing pid 23782 tid 100120 td 0xcc445180 > db>OK, seeing as nobody has offered any advice, I'll have a go. Have you got a debug kernel? If so, get a kernel dump. Load it into kgdb. Chances are "bt" won't work as the instruction pointer is zero, so instead you need to display the stack directly: (kgdb) x/80xw 0xf09b3b98 Look for any addresses in the 0xc0xxxxxx range - these will probably be pointers to kernel functions. Drop out of kgdb, and try to find out which functions these belong to: addr2line 0xc0639bd6 -e kernel.debug /usr/src/sys/kern/tty.c:1653 You can build up a backtrace and knowledge of atguments given to functions this way. Gavin