Karl Denninger
2017-Jan-11 14:31 UTC
Ugh -- attempted to update this morning, and got a nasty panic in ZFS....
During the reboot, immediately after the daemons started up on the machine (the boot got beyond mounting all the disks and was well into starting up all the background stuff it runs), I got a double-fault. ..... (there were a LOT more of this same; it pretty clearly was a recursive call sequence that ran the system out of stack space) #294 0xffffffff822fdcfd in zio_execute (zio=<value optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666 #295 0xffffffff8230130e in zio_vdev_io_start (zio=0xfffff8010c8f27b0) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3127 #296 0xffffffff822fdcfd in zio_execute (zio=<value optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666 #297 0xffffffff822e464d in vdev_queue_io_done (zio=<value optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:913 #298 0xffffffff823014c9 in zio_vdev_io_done (zio=0xfffff8010cff0b88) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3152 #299 0xffffffff822fdcfd in zio_execute (zio=<value optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666 #300 0xffffffff8230130e in zio_vdev_io_start (zio=0xfffff8010cff0b88) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3127 #301 0xffffffff822fdcfd in zio_execute (zio=<value optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666 #302 0xffffffff822e464d in vdev_queue_io_done (zio=<value optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:913 #303 0xffffffff823014c9 in zio_vdev_io_done (zio=0xfffff8010c962000) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3152 #304 0xffffffff822fdcfd in zio_execute (zio=<value optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666 #305 0xffffffff8230130e in zio_vdev_io_start (zio=0xfffff8010c962000) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3127 #306 0xffffffff822fdcfd in zio_execute (zio=<value optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666 #307 0xffffffff822e464d in vdev_queue_io_done (zio=<value optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:913 #308 0xffffffff823014c9 in zio_vdev_io_done (zio=0xfffff80102175000) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3152 #309 0xffffffff822fdcfd in zio_execute (zio=<value optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666 #310 0xffffffff80b2585a in taskqueue_run_locked (queue=<value optimized out>) at /usr/src/sys/kern/subr_taskqueue.c:454 #311 0xffffffff80b26a48 in taskqueue_thread_loop (arg=<value optimized out>) at /usr/src/sys/kern/subr_taskqueue.c:724 #312 0xffffffff80a7eb05 in fork_exit ( callout=0xffffffff80b26960 <taskqueue_thread_loop>, arg=0xfffff800b8824c30, frame=0xfffffe0667430c00) at /usr/src/sys/kern/kern_fork.c:1040 #313 0xffffffff80f87c3e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:611 #314 0x0000000000000000 in ?? () Current language: auto; currently minimal (kgdb) ..... NewFS.denninger.net dumped core - see /var/crash/vmcore.3 Wed Jan 11 08:15:33 CST 2017 FreeBSD NewFS.denninger.net 11.0-STABLE FreeBSD 11.0-STABLE #14 r311927M: Wed Ja n 11 07:55:20 CST 2017 karl at NewFS.denninger.net:/usr/obj/usr/src/sys/KSD-SMP amd64 panic: double fault GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: Fatal double fault rip = 0xffffffff822e3c5d rsp = 0xfffffe066742af90 rbp = 0xfffffe066742b420 cpuid = 15; apic id = 35 panic: double fault cpuid = 15 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0649ddee30 vpanic() at vpanic+0x186/frame 0xfffffe0649ddeeb0 panic() at panic+0x43/frame 0xfffffe0649ddef10 dblfault_handler() at dblfault_handler+0xa2/frame 0xfffffe0649ddef30 Xdblfault() at Xdblfault+0xac/frame 0xfffffe0649ddef30 --- trap 0x17, rip = 0xffffffff822e3c5d, rsp = 0xfffffe066742af90, rbp 0xfffff e066742b420 --- # Work around for this CPU from 11.x eratta vm.pmap.pcid_enabled=0 # # # Try to avoid kernel stack exhaustion due to TRIM storms. kern.kstack_pages="6" I have kstack_pages set to "6" to try to avoid another panic that I got occasionally during zfs backup operations which appeared to be linked to "too many" TRIMs, and looks very similar to this one. I rebooted back to kernel.old, which was built in October, and the machine came up normally. I'll try the newer build again and see if this was transient and related to delayed TRIM operations on the disks related to the installworld/installkernel. But if it is then it remains a problem -- and setting stackpages didn't help! I've got the dump if anything in particular would be of help. The prompt to do this in the first place was the openssh CVE that was recently issued..... -- Karl Denninger karl at denninger.net <mailto:karl at denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2993 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20170111/31efd38d/attachment.bin>
Karl Denninger
2017-Jan-11 14:50 UTC
Ugh -- attempted to update this morning, and got a nasty panic in ZFS....
A second attempt to come up on the new kernel was successful -- so this had to be due to queued I/Os that were pending at the time of the shutdown.... On 1/11/2017 08:31, Karl Denninger wrote:> During the reboot, immediately after the daemons started up on the > machine (the boot got beyond mounting all the disks and was well into > starting up all the background stuff it runs), I got a double-fault. > > ..... (there were a LOT more of this same; it pretty clearly was a > recursive call sequence that ran the system out of stack space) > > #294 0xffffffff822fdcfd in zio_execute (zio=<value optimized out>) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666 > #295 0xffffffff8230130e in zio_vdev_io_start (zio=0xfffff8010c8f27b0) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3127 > #296 0xffffffff822fdcfd in zio_execute (zio=<value optimized out>) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666 > #297 0xffffffff822e464d in vdev_queue_io_done (zio=<value optimized out>) > at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:913 > #298 0xffffffff823014c9 in zio_vdev_io_done (zio=0xfffff8010cff0b88) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3152 > #299 0xffffffff822fdcfd in zio_execute (zio=<value optimized out>) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666 > #300 0xffffffff8230130e in zio_vdev_io_start (zio=0xfffff8010cff0b88) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3127 > #301 0xffffffff822fdcfd in zio_execute (zio=<value optimized out>) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666 > #302 0xffffffff822e464d in vdev_queue_io_done (zio=<value optimized out>) > at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:913 > #303 0xffffffff823014c9 in zio_vdev_io_done (zio=0xfffff8010c962000) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3152 > #304 0xffffffff822fdcfd in zio_execute (zio=<value optimized out>) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666 > #305 0xffffffff8230130e in zio_vdev_io_start (zio=0xfffff8010c962000) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3127 > #306 0xffffffff822fdcfd in zio_execute (zio=<value optimized out>) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666 > #307 0xffffffff822e464d in vdev_queue_io_done (zio=<value optimized out>) > at > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:913 > #308 0xffffffff823014c9 in zio_vdev_io_done (zio=0xfffff80102175000) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3152 > #309 0xffffffff822fdcfd in zio_execute (zio=<value optimized out>) > at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1666 > #310 0xffffffff80b2585a in taskqueue_run_locked (queue=<value optimized > out>) > at /usr/src/sys/kern/subr_taskqueue.c:454 > #311 0xffffffff80b26a48 in taskqueue_thread_loop (arg=<value optimized out>) > at /usr/src/sys/kern/subr_taskqueue.c:724 > #312 0xffffffff80a7eb05 in fork_exit ( > callout=0xffffffff80b26960 <taskqueue_thread_loop>, > arg=0xfffff800b8824c30, frame=0xfffffe0667430c00) > at /usr/src/sys/kern/kern_fork.c:1040 > #313 0xffffffff80f87c3e in fork_trampoline () > at /usr/src/sys/amd64/amd64/exception.S:611 > #314 0x0000000000000000 in ?? () > Current language: auto; currently minimal > (kgdb) > > ..... > > > NewFS.denninger.net dumped core - see /var/crash/vmcore.3 > > Wed Jan 11 08:15:33 CST 2017 > > FreeBSD NewFS.denninger.net 11.0-STABLE FreeBSD 11.0-STABLE #14 > r311927M: Wed Ja > n 11 07:55:20 CST 2017 > karl at NewFS.denninger.net:/usr/obj/usr/src/sys/KSD-SMP > amd64 > > panic: double fault > > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd"... > > Unread portion of the kernel message buffer: > > Fatal double fault > rip = 0xffffffff822e3c5d > rsp = 0xfffffe066742af90 > rbp = 0xfffffe066742b420 > cpuid = 15; apic id = 35 > panic: double fault > cpuid = 15 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xfffffe0649ddee30 > vpanic() at vpanic+0x186/frame 0xfffffe0649ddeeb0 > panic() at panic+0x43/frame 0xfffffe0649ddef10 > dblfault_handler() at dblfault_handler+0xa2/frame 0xfffffe0649ddef30 > Xdblfault() at Xdblfault+0xac/frame 0xfffffe0649ddef30 > --- trap 0x17, rip = 0xffffffff822e3c5d, rsp = 0xfffffe066742af90, rbp > 0xfffff > e066742b420 --- > > # Work around for this CPU from 11.x eratta > vm.pmap.pcid_enabled=0 > # > # > # Try to avoid kernel stack exhaustion due to TRIM storms. > kern.kstack_pages="6" > > I have kstack_pages set to "6" to try to avoid another panic that I got > occasionally during zfs backup operations which appeared to be linked to > "too many" TRIMs, and looks very similar to this one. > > I rebooted back to kernel.old, which was built in October, and the > machine came up normally. I'll try the newer build again and see if > this was transient and related to delayed TRIM operations on the disks > related to the installworld/installkernel. But if it is then it remains > a problem -- and setting stackpages didn't help! > > I've got the dump if anything in particular would be of help. > > The prompt to do this in the first place was the openssh CVE that was > recently issued..... > >-- Karl Denninger karl at denninger.net <mailto:karl at denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2993 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20170111/16412260/attachment.bin>