Frank Razenberg
2015-Oct-14 13:52 UTC
10.2-STABLE amd64 panic: page fault while in kernel mode
After upgrading from 9.2 to 10.1 I first started noticing panics. They
occurred roughly weekly and since this storage machine isn't frequently
used I didn't look into it much further. After updating for 10.2-STABLE
the panics have gone from weekly to daily.
The machine has 32GB of non-registered ECC DDR3-1066 RAM. There's also a
10-disk raidz2 pool. I've ran memtest86+ for 72 hours straight with no
errors.
Crash dumps all feature the following:
Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 12
fault virtual address = 0x1d1c0bec0
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff804fda65
stack pointer = 0x28:0xfffffe0698f21870
frame pointer = 0x28:0xfffffe0698f218d0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 6106 (pickup)
trap number = 12
panic: page fault
cpuid = 2
(kgdb) bt
#0 doadump (textdump=<value optimized out>) at pcpu.h:219
#1 0xffffffff8053ce32 in kern_reboot (howto=260) at
/usr/src/sys/kern/kern_shutdown.c:455
#2 0xffffffff8053d215 in vpanic (fmt=<value optimized out>, ap=<value
optimized out>) at /usr/src/sys/kern/kern_shutdown.c:762
#3 0xffffffff8053d0a3 in panic (fmt=0x0) at
/usr/src/sys/kern/kern_shutdown.c:691
#4 0xffffffff807755db in trap_fatal (frame=<value optimized out>,
eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:851
#5 0xffffffff807758dd in trap_pfault (frame=0xfffffe0698dbc7c0,
usermode=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:674
#6 0xffffffff80774f7a in trap (frame=0xfffffe0698dbc7c0) at
/usr/src/sys/amd64/amd64/trap.c:440
#7 0xffffffff8075b0f2 in calltrap () at
/usr/src/sys/amd64/amd64/exception.S:236
#8 0xffffffff804fda65 in kqueue_close (fp=0xfffff803e4967190,
td=0xfffff80014b094a0) at /usr/src/sys/kern/kern_event.c:1750
#9 0xffffffff804f25f9 in _fdrop (fp=0xfffff803e4967190,
td=0xfffff802b5d2a000) at file.h:343
#10 0xffffffff804f4e9e in closef (fp=<value optimized out>, td=<value
optimized out>) at /usr/src/sys/kern/kern_descrip.c:2338
#11 0xffffffff804f4ab9 in fdescfree (td=0xfffff80014b094a0) at
/usr/src/sys/kern/kern_descrip.c:2106
#12 0xffffffff805013a9 in exit1 (td=0xfffff80014b094a0, rv=<value
optimized out>) at /usr/src/sys/kern/kern_exit.c:369
#13 0xffffffff80500e3e in sys_sys_exit (td=0xfffffe000782e060,
uap=<value optimized out>) at /usr/src/sys/kern/kern_exit.c:179
#14 0xffffffff80775efd in amd64_syscall (td=0xfffff80014b094a0,
traced=0) at subr_syscall.c:134
#15 0xffffffff8075b3db in Xfast_syscall () at
/usr/src/sys/amd64/amd64/exception.S:396
#16 0x000000080120335a in ?? ()
Most of the dumps list 'pickup' as current process. All of them have
'kqueue_close' in the backtrace.
I'm not sure what the next step in diagnosing the issue is. Any pointers
would be greatly appreciated.
-Frank
Konstantin Belousov
2015-Oct-14 14:42 UTC
10.2-STABLE amd64 panic: page fault while in kernel mode
On Wed, Oct 14, 2015 at 03:52:47PM +0200, Frank Razenberg wrote:> After upgrading from 9.2 to 10.1 I first started noticing panics. They > occurred roughly weekly and since this storage machine isn't frequently > used I didn't look into it much further. After updating for 10.2-STABLE > the panics have gone from weekly to daily. > The machine has 32GB of non-registered ECC DDR3-1066 RAM. There's also a > 10-disk raidz2 pool. I've ran memtest86+ for 72 hours straight with no > errors. > > Crash dumps all feature the following: > > Fatal trap 12: page fault while in kernel mode > cpuid = 2; apic id = 12 > fault virtual address = 0x1d1c0bec0 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0xffffffff804fda65 > stack pointer = 0x28:0xfffffe0698f21870 > frame pointer = 0x28:0xfffffe0698f218d0 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 6106 (pickup) > trap number = 12 > panic: page fault > cpuid = 2 > > > (kgdb) bt > #0 doadump (textdump=<value optimized out>) at pcpu.h:219 > #1 0xffffffff8053ce32 in kern_reboot (howto=260) at > /usr/src/sys/kern/kern_shutdown.c:455 > #2 0xffffffff8053d215 in vpanic (fmt=<value optimized out>, ap=<value > optimized out>) at /usr/src/sys/kern/kern_shutdown.c:762 > #3 0xffffffff8053d0a3 in panic (fmt=0x0) at > /usr/src/sys/kern/kern_shutdown.c:691 > #4 0xffffffff807755db in trap_fatal (frame=<value optimized out>, > eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:851 > #5 0xffffffff807758dd in trap_pfault (frame=0xfffffe0698dbc7c0, > usermode=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:674 > #6 0xffffffff80774f7a in trap (frame=0xfffffe0698dbc7c0) at > /usr/src/sys/amd64/amd64/trap.c:440 > #7 0xffffffff8075b0f2 in calltrap () at > /usr/src/sys/amd64/amd64/exception.S:236 > #8 0xffffffff804fda65 in kqueue_close (fp=0xfffff803e4967190, > td=0xfffff80014b094a0) at /usr/src/sys/kern/kern_event.c:1750 > #9 0xffffffff804f25f9 in _fdrop (fp=0xfffff803e4967190, > td=0xfffff802b5d2a000) at file.h:343 > #10 0xffffffff804f4e9e in closef (fp=<value optimized out>, td=<value > optimized out>) at /usr/src/sys/kern/kern_descrip.c:2338 > #11 0xffffffff804f4ab9 in fdescfree (td=0xfffff80014b094a0) at > /usr/src/sys/kern/kern_descrip.c:2106 > #12 0xffffffff805013a9 in exit1 (td=0xfffff80014b094a0, rv=<value > optimized out>) at /usr/src/sys/kern/kern_exit.c:369 > #13 0xffffffff80500e3e in sys_sys_exit (td=0xfffffe000782e060, > uap=<value optimized out>) at /usr/src/sys/kern/kern_exit.c:179 > #14 0xffffffff80775efd in amd64_syscall (td=0xfffff80014b094a0, > traced=0) at subr_syscall.c:134 > #15 0xffffffff8075b3db in Xfast_syscall () at > /usr/src/sys/amd64/amd64/exception.S:396 > #16 0x000000080120335a in ?? () > > Most of the dumps list 'pickup' as current process. All of them have > 'kqueue_close' in the backtrace. > I'm not sure what the next step in diagnosing the issue is. Any pointers > would be greatly appreciated.What is exact revision of the checkout you run, where the panic above occurs ? Please load the kernel.debug + vmcore into kgdb, go to frame 8, and do p *kq p *kn p i p kq->kq_knlist[i].slh_first p *(kq->kq_knlist[i].slh_first)