Patrick Lamaiziere
2013-Jul-14 09:59 UTC
(9.2) panic under disk load (gam_server / knlist_remove_kq)
9.2 PRERELEASE (today) / amd64 Hello, I'm seeing a panic while trying to build a poudriere repository. As far I can see it always happens when gam_server is started (ie xfce is running) and under disk load (poudriere bulk build) : (That is something new, the box was pretty stable) the complete crash dump (core.0.txt) is here: http://user.lamaiziere.net/patrick/panic_gam_server.txt Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 02 fault virtual address = 0x58 fault code = supervisor read data, page not present Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 02 fault virtual address = 0x58 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff808f1bf1 stack pointer = 0x28:0xffffff8108e12a40 frame pointer = 0x28:0xffffff8108e12a70 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 23557 (gam_server) trap number = 12 panic: page fault cpuid = 1 ... #0 doadump (textdump=<value optimized out>) at pcpu.h:234 234 pcpu.h: No such file or directory. in pcpu.h (kgdb) #0 doadump (textdump=<value optimized out>) at pcpu.h:234 #1 0xffffffff8092e4d6 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:449 #2 0xffffffff8092e9d7 in panic (fmt=0x1 <Address 0x1 out of bounds>) at /usr/src/sys/kern/kern_shutdown.c:637 #3 0xffffffff80d13030 in trap_fatal (frame=0xc, eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:879 #4 0xffffffff80d13391 in trap_pfault (frame=0xffffff8108e12990, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:795 #5 0xffffffff80d13944 in trap (frame=0xffffff8108e12990) at /usr/src/sys/amd64/amd64/trap.c:463 #6 0xffffffff80cfcc73 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:232 #7 0xffffffff808f1bf1 in knlist_remove_kq (knl=0x30, kn=0xfffffe003b70b280, knlislocked=0, kqislocked=0) at /usr/src/sys/kern/kern_event.c:1847 #8 0xffffffff808f4a5b in knote_fdclose (td=0xfffffe0009a34490, fd=9924) at /usr/src/sys/kern/kern_event.c:2065 #9 0xffffffff808ea573 in kern_close (td=0xfffffe0009a34490, fd=9924) at /usr/src/sys/kern/kern_descrip.c:1250 #10 0xffffffff80d127da in amd64_syscall (td=0xfffffe0009a34490, traced=0) at subr_syscall.c:135 #11 0xffffffff80cfcf57 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:391 #12 0x00000008019e9a9c in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) Thanks, regards
Patrick Lamaiziere
2013-Jul-14 14:33 UTC
(9.2) panic under disk load (gam_server / knlist_remove_kq)
Le Sun, 14 Jul 2013 11:59:53 +0200, Patrick Lamaiziere <patfbsd at davenulle.org> a ?crit : Hello,> 9.2 PRERELEASE (today) / amd64 > > Hello, > > I'm seeing a panic while trying to build a poudriere repository. > > As far I can see it always happens when gam_server is started (ie > xfce is running) and under disk load (poudriere bulk build) : > (That is something new, the box was pretty stable) > > the complete crash dump (core.0.txt) is here: > http://user.lamaiziere.net/patrick/panic_gam_server.txtWith WITNESS and ASSERTION on, I see a warning that looks related : Jul 14 16:23:29 roxette kernel: WARNING: destroying knlist w/ knotes on it! and the box panics just after this. Also there are too LOR just before the panic, I don't know it there are related or not : Jul 14 16:23:29 roxette kernel: lock order reversal: Jul 14 16:23:29 roxette kernel: 1st 0xfffffe0013335878 zfs (zfs) @ /usr/src/sys/kern/vfs_mount.c:1240 Jul 14 16:23:29 roxette kernel: 2nd 0xfffffe00495a5488 syncer (syncer) @ /usr/src/sys/kern/vfs_subr.c:2335 Jul 14 16:23:29 roxette kernel: KDB: stack backtrace: Jul 14 16:23:29 roxette kernel: #0 0xffffffff8094bc36 at kdb_backtrace+0x66 Jul 14 16:23:29 roxette kernel: #1 0xffffffff809603be at _witness_debugger+0x2e Jul 14 16:23:29 roxette kernel: #2 0xffffffff80961a95 at witness_checkorder+0x865 Jul 14 16:23:29 roxette kernel: #3 0xffffffff808f8e21 at __lockmgr_args+0x1161 Jul 14 16:23:29 roxette kernel: #4 0xffffffff8099f739 at vop_stdlock+0x39 Jul 14 16:23:29 roxette kernel: #5 0xffffffff80d93593 at VOP_LOCK1_APV+0xe3 Jul 14 16:23:29 roxette kernel: #6 0xffffffff809c0727 at _vn_lock+0x47 Jul 14 16:23:29 roxette kernel: #7 0xffffffff809b42d8 at vputx+0x328 Jul 14 16:23:29 roxette kernel: #8 0xffffffff809a88d4 at dounmount+0x294 Jul 14 16:23:29 roxette kernel: #9 0xffffffff809a914e at sys_unmount+0x3ce Jul 14 16:23:29 roxette kernel: #10 0xffffffff80cec439 at amd64_syscall+0x2f9 Jul 14 16:23:29 roxette kernel: #11 0xffffffff80cd6d57 at Xfast_syscall+0xf7 Jul 14 16:23:29 roxette kernel: lock order reversal: Jul 14 16:23:29 roxette kernel: 1st 0xfffffe006e1eac68 ufs (ufs) @ /usr/src/sys/modules/nullfs/../../fs/nullfs/null_vnops.c:620 Jul 14 16:23:29 roxette kernel: 2nd 0xffffffff813ebda0 allproc (allproc) @ /usr/src/sys/kern/kern_descrip.c:2822 Jul 14 16:23:29 roxette kernel: KDB: stack backtrace: Jul 14 16:23:29 roxette kernel: #0 0xffffffff8094bc36 at kdb_backtrace+0x66 Jul 14 16:23:29 roxette kernel: #1 0xffffffff809603be at _witness_debugger+0x2e Jul 14 16:23:29 roxette kernel: #2 0xffffffff80961a95 at witness_checkorder+0x865 Jul 14 16:23:29 roxette kernel: #3 0xffffffff8091b1fa at _sx_slock+0x5a Jul 14 16:23:29 roxette kernel: #4 0xffffffff808d30ff at mountcheckdirs+0x3f Jul 14 16:23:29 roxette kernel: #5 0xffffffff809a891f at dounmount+0x2df Jul 14 16:23:29 roxette kernel: #6 0xffffffff809a914e at sys_unmount+0x3ce Jul 14 16:23:29 roxette kernel: #7 0xffffffff80cec439 at amd64_syscall+0x2f9 Jul 14 16:23:29 roxette kernel: #8 0xffffffff80cd6d57 at Xfast_syscall+0xf7 Jul 14 16:23:29 roxette kernel: WARNING: destroying knlist w/ knotes on it! Thanks, regards
Dag-Erling Smørgrav
2013-Sep-14 13:40 UTC
Reproducible panic in 9.2-RC4 (was: Re: (9.2) panic under disk load (gam_server / knlist_remove_kq))
Patrick Lamaiziere <patfbsd at davenulle.org> writes:> I'm seeing a panic while trying to build a poudriere repository. > [...]A related panic still occurs in 9.2-RC4. It is 100% reproducible: just start a poudriere build while Gnome is running. The culprit this time seems to be gvfsd-trash. Killing it doesn't work, because Gnome will restart it, but stopping it (pkill -STOP gvfsd-trash) does. I merged r254024 from stable/9 to see if it would help; it didn't. I get the following core.txt: Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 02 fault virtual address = 0x368 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff808f920d stack pointer = 0x28:0xffffff825217c770 frame pointer = 0x28:0xffffff825217c7e0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 1707 (gvfsd-trash) trap number = 12 panic: page fault cpuid = 2 KDB: stack backtrace: #0 0xffffffff80947986 at kdb_backtrace+0x66 #1 0xffffffff8090d9ae at panic+0x1ce #2 0xffffffff80cf2110 at trap_fatal+0x290 #3 0xffffffff80cf2471 at trap_pfault+0x211 #4 0xffffffff80cf2a24 at trap+0x344 #5 0xffffffff80cdbd53 at calltrap+0x8 #6 0xffffffff809ab5e3 at filt_vfsvnode+0xf3 #7 0xffffffff808d1d46 at kqueue_register+0x3e6 #8 0xffffffff808d2396 at kern_kevent+0x106 #9 0xffffffff808d2ed0 at sys_kevent+0x90 #10 0xffffffff80cf18ba at amd64_syscall+0x5ea #11 0xffffffff80cdc037 at Xfast_syscall+0xf7 but gdb gives a slightly different backtrace: #0 doadump (textdump=<value optimized out>) at pcpu.h:234 #1 0xffffffff8090d486 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:449 #2 0xffffffff8090d987 in panic (fmt=0x1 <Address 0x1 out of bounds>) at /usr/src/sys/kern/kern_shutdown.c:637 #3 0xffffffff80cf2110 in trap_fatal (frame=0xc, eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:879 #4 0xffffffff80cf2471 in trap_pfault (frame=0xffffff825217c6c0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:795 #5 0xffffffff80cf2a24 in trap (frame=0xffffff825217c6c0) at /usr/src/sys/amd64/amd64/trap.c:463 #6 0xffffffff80cdbd53 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:232 #7 0xffffffff808f920d in _mtx_lock_sleep (m=0xfffffe01068564b8, tid=18446741882246426624, opts=<value optimized out>, file=<value optimized out>, line=0) at /usr/src/sys/kern/kern_mutex.c:388 #8 0xffffffff809ab5e3 in filt_vfsvnode (kn=0xfffffe0136e11d00, hint=0) at /usr/src/sys/kern/vfs_subr.c:4600 #9 0xffffffff808d1d46 in kqueue_register (kq=0xfffffe001a73ec00, kev=0xffffff825217c980, td=0xfffffe01c29e7000, waitok=1) at /usr/src/sys/kern/kern_event.c:1136 #10 0xffffffff808d2396 in kern_kevent (td=0xfffffe01c29e7000, fd=<value optimized out>, nchanges=7, nevents=1, k_ops=0xffffff825217caa0, timeout=0x0) at /usr/src/sys/kern/kern_event.c:847 #11 0xffffffff808d2ed0 in sys_kevent (td=0xfffffe01c29e7000, uap=0xffffff825217cbb0) at /usr/src/sys/kern/kern_event.c:768 #12 0xffffffff80cf18ba in amd64_syscall (td=0xfffffe01c29e7000, traced=0) at subr_syscall.c:135 #13 0xffffffff80cdc037 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:391 #14 0x0000000802e84d8c in ?? () Previous frame inner to this frame (corrupt stack?) This is GENERIC built from the latest releng/9.2 sources with no other modifications than the addition of r254024. With the stock kernel (from freebsd-update), I get #0 0xffffffff80947986 at kdb_backtrace+0x66 #1 0xffffffff8090d9ae at panic+0x1ce #2 0xffffffff80cf20d0 at trap_fatal+0x290 #3 0xffffffff80cf2431 at trap_pfault+0x211 #4 0xffffffff80cf29e4 at trap+0x344 #5 0xffffffff80cdbd13 at calltrap+0x8 #6 0xffffffff808d2396 at kern_kevent+0x106 #7 0xffffffff808d2ed0 at sys_kevent+0x90 #8 0xffffffff80cf187a at amd64_syscall+0x5ea #9 0xffffffff80cdbff7 at Xfast_syscall+0xf7 but kgdb is useless - it doesn't see past calltrap(). DES -- Dag-Erling Sm?rgrav - des at des.no