Frank Razenberg
2015-Oct-14 13:52 UTC
10.2-STABLE amd64 panic: page fault while in kernel mode
After upgrading from 9.2 to 10.1 I first started noticing panics. They occurred roughly weekly and since this storage machine isn't frequently used I didn't look into it much further. After updating for 10.2-STABLE the panics have gone from weekly to daily. The machine has 32GB of non-registered ECC DDR3-1066 RAM. There's also a 10-disk raidz2 pool. I've ran memtest86+ for 72 hours straight with no errors. Crash dumps all feature the following: Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 12 fault virtual address = 0x1d1c0bec0 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff804fda65 stack pointer = 0x28:0xfffffe0698f21870 frame pointer = 0x28:0xfffffe0698f218d0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 6106 (pickup) trap number = 12 panic: page fault cpuid = 2 (kgdb) bt #0 doadump (textdump=<value optimized out>) at pcpu.h:219 #1 0xffffffff8053ce32 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:455 #2 0xffffffff8053d215 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:762 #3 0xffffffff8053d0a3 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:691 #4 0xffffffff807755db in trap_fatal (frame=<value optimized out>, eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:851 #5 0xffffffff807758dd in trap_pfault (frame=0xfffffe0698dbc7c0, usermode=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:674 #6 0xffffffff80774f7a in trap (frame=0xfffffe0698dbc7c0) at /usr/src/sys/amd64/amd64/trap.c:440 #7 0xffffffff8075b0f2 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 #8 0xffffffff804fda65 in kqueue_close (fp=0xfffff803e4967190, td=0xfffff80014b094a0) at /usr/src/sys/kern/kern_event.c:1750 #9 0xffffffff804f25f9 in _fdrop (fp=0xfffff803e4967190, td=0xfffff802b5d2a000) at file.h:343 #10 0xffffffff804f4e9e in closef (fp=<value optimized out>, td=<value optimized out>) at /usr/src/sys/kern/kern_descrip.c:2338 #11 0xffffffff804f4ab9 in fdescfree (td=0xfffff80014b094a0) at /usr/src/sys/kern/kern_descrip.c:2106 #12 0xffffffff805013a9 in exit1 (td=0xfffff80014b094a0, rv=<value optimized out>) at /usr/src/sys/kern/kern_exit.c:369 #13 0xffffffff80500e3e in sys_sys_exit (td=0xfffffe000782e060, uap=<value optimized out>) at /usr/src/sys/kern/kern_exit.c:179 #14 0xffffffff80775efd in amd64_syscall (td=0xfffff80014b094a0, traced=0) at subr_syscall.c:134 #15 0xffffffff8075b3db in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 #16 0x000000080120335a in ?? () Most of the dumps list 'pickup' as current process. All of them have 'kqueue_close' in the backtrace. I'm not sure what the next step in diagnosing the issue is. Any pointers would be greatly appreciated. -Frank
Konstantin Belousov
2015-Oct-14 14:42 UTC
10.2-STABLE amd64 panic: page fault while in kernel mode
On Wed, Oct 14, 2015 at 03:52:47PM +0200, Frank Razenberg wrote:> After upgrading from 9.2 to 10.1 I first started noticing panics. They > occurred roughly weekly and since this storage machine isn't frequently > used I didn't look into it much further. After updating for 10.2-STABLE > the panics have gone from weekly to daily. > The machine has 32GB of non-registered ECC DDR3-1066 RAM. There's also a > 10-disk raidz2 pool. I've ran memtest86+ for 72 hours straight with no > errors. > > Crash dumps all feature the following: > > Fatal trap 12: page fault while in kernel mode > cpuid = 2; apic id = 12 > fault virtual address = 0x1d1c0bec0 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0xffffffff804fda65 > stack pointer = 0x28:0xfffffe0698f21870 > frame pointer = 0x28:0xfffffe0698f218d0 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 6106 (pickup) > trap number = 12 > panic: page fault > cpuid = 2 > > > (kgdb) bt > #0 doadump (textdump=<value optimized out>) at pcpu.h:219 > #1 0xffffffff8053ce32 in kern_reboot (howto=260) at > /usr/src/sys/kern/kern_shutdown.c:455 > #2 0xffffffff8053d215 in vpanic (fmt=<value optimized out>, ap=<value > optimized out>) at /usr/src/sys/kern/kern_shutdown.c:762 > #3 0xffffffff8053d0a3 in panic (fmt=0x0) at > /usr/src/sys/kern/kern_shutdown.c:691 > #4 0xffffffff807755db in trap_fatal (frame=<value optimized out>, > eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:851 > #5 0xffffffff807758dd in trap_pfault (frame=0xfffffe0698dbc7c0, > usermode=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:674 > #6 0xffffffff80774f7a in trap (frame=0xfffffe0698dbc7c0) at > /usr/src/sys/amd64/amd64/trap.c:440 > #7 0xffffffff8075b0f2 in calltrap () at > /usr/src/sys/amd64/amd64/exception.S:236 > #8 0xffffffff804fda65 in kqueue_close (fp=0xfffff803e4967190, > td=0xfffff80014b094a0) at /usr/src/sys/kern/kern_event.c:1750 > #9 0xffffffff804f25f9 in _fdrop (fp=0xfffff803e4967190, > td=0xfffff802b5d2a000) at file.h:343 > #10 0xffffffff804f4e9e in closef (fp=<value optimized out>, td=<value > optimized out>) at /usr/src/sys/kern/kern_descrip.c:2338 > #11 0xffffffff804f4ab9 in fdescfree (td=0xfffff80014b094a0) at > /usr/src/sys/kern/kern_descrip.c:2106 > #12 0xffffffff805013a9 in exit1 (td=0xfffff80014b094a0, rv=<value > optimized out>) at /usr/src/sys/kern/kern_exit.c:369 > #13 0xffffffff80500e3e in sys_sys_exit (td=0xfffffe000782e060, > uap=<value optimized out>) at /usr/src/sys/kern/kern_exit.c:179 > #14 0xffffffff80775efd in amd64_syscall (td=0xfffff80014b094a0, > traced=0) at subr_syscall.c:134 > #15 0xffffffff8075b3db in Xfast_syscall () at > /usr/src/sys/amd64/amd64/exception.S:396 > #16 0x000000080120335a in ?? () > > Most of the dumps list 'pickup' as current process. All of them have > 'kqueue_close' in the backtrace. > I'm not sure what the next step in diagnosing the issue is. Any pointers > would be greatly appreciated.What is exact revision of the checkout you run, where the panic above occurs ? Please load the kernel.debug + vmcore into kgdb, go to frame 8, and do p *kq p *kn p i p kq->kq_knlist[i].slh_first p *(kq->kq_knlist[i].slh_first)