David Cross
2016-Jul-06 15:30 UTC
Reproducable panic in FFS with softupdates and no journaling (10.3-RELEASE-pLATEST)
No kernel messages before (if there were I would have written this off a long time ago); And as of right now, this is probably the most fsck-ed filesystem on the planet!.. I have an 'image' that I am going on that is ggate mounted, so I can access it in a bhyve VM to ease debuging so I am not crashing my real machine (with the real filesystem) all the time. One of my initial guesses was that this was a CG allocation error, but a dumpfs seems to show plenty of blocks in the CG to meet this need. Quick note on the testcase, I haven't totally isolated it yet, but the minimal reproduction is a 'ctl_cyrusdb -r", which runs a bdb5 recover op, a ktrace on that shows that it unlinks 3 files, opens them, lseeks then, writes a block, and then mmaps them (but leaves them open). At process termination is munmaps, and then closes. I have tried to write a shorter reproduction that opens, seeks, mmaps (with the same flags), writes the mmaped memory, munmaps, closes and exits, but this has been insufficient to reproduce the issue; There is likely some specific pattern in the bdb5 code tickling this, and behind the mmap-ed interface it is all opaque, and the bdb5 code is pretty complex itself On Wed, Jul 6, 2016 at 11:18 AM, Konstantin Belousov <kostikbel at gmail.com> wrote:> On Wed, Jul 06, 2016 at 10:51:28AM -0400, David Cross wrote: > > Ok.. to reply to my own message, I using ktr and debugging printfs I have > > found the culprit.. but I am still at a loss to 'why', or what the > > appropriate fix is. > > > > Lets go back to the panic (simplified) > > > > #0 0xffffffff8043f160 at kdb_backtrace+0x60 > > #1 0xffffffff80401454 at vpanic+0x124 > > #2 0xffffffff804014e3 at panic+0x43 > > #3 0xffffffff8060719a at softdep_deallocate_dependencies+0x6a > > #4 0xffffffff80499cc1 at brelse+0x151 > > #5 0xffffffff804979b1 at bufwrite+0x81 > > #6 0xffffffff80623c80 at ffs_write+0x4b0 > > #7 0xffffffff806ce9a4 at VOP_WRITE_APV+0x1c4 > > #8 0xffffffff806639e3 at vnode_pager_generic_putpages+0x293 > > #9 0xffffffff806d2102 at VOP_PUTPAGES_APV+0x142 > > #10 0xffffffff80661cc1 at vnode_pager_putpages+0x91 > > #11 0xffffffff806588e6 at vm_pageout_flush+0x116 > > #12 0xffffffff806517e2 at vm_object_page_collect_flush+0x1c2 > > #13 0xffffffff80651519 at vm_object_page_clean+0x179 > > #14 0xffffffff80651102 at vm_object_terminate+0xa2 > > #15 0xffffffff806621a5 at vnode_destroy_vobject+0x85 > > #16 0xffffffff8062a52f at ufs_reclaim+0x1f > > #17 0xffffffff806d0782 at VOP_RECLAIM_APV+0x142 > > > > Via KTR logging I determined that the dangling dependedency was on a > > freshly allocated buf, *after* vinvalbuf in the vgonel() (so in > VOP_RECLAIM > > itself), called by the vnode lru cleanup process; I further noticed that > it > > was in a newbuf that recycled a bp (unimportant, except it let me narrow > > down my logging to something managable), from there I get this stacktrace > > (simplified) > > > > #0 0xffffffff8043f160 at kdb_backtrace+0x60 > > #1 0xffffffff8049c98e at getnewbuf+0x4be > > #2 0xffffffff804996a0 at getblk+0x830 > > #3 0xffffffff805fb207 at ffs_balloc_ufs2+0x1327 > > #4 0xffffffff80623b0b at ffs_write+0x33b > > #5 0xffffffff806ce9a4 at VOP_WRITE_APV+0x1c4 > > #6 0xffffffff806639e3 at vnode_pager_generic_putpages+0x293 > > #7 0xffffffff806d2102 at VOP_PUTPAGES_APV+0x142 > > #8 0xffffffff80661cc1 at vnode_pager_putpages+0x91 > > #9 0xffffffff806588e6 at vm_pageout_flush+0x116 > > #10 0xffffffff806517e2 at vm_object_page_collect_flush+0x1c2 > > #11 0xffffffff80651519 at vm_object_page_clean+0x179 > > #12 0xffffffff80651102 at vm_object_terminate+0xa2 > > #13 0xffffffff806621a5 at vnode_destroy_vobject+0x85 > > #14 0xffffffff8062a52f at ufs_reclaim+0x1f > > #15 0xffffffff806d0782 at VOP_RECLAIM_APV+0x142 > > #16 0xffffffff804b6c6e at vgonel+0x2ee > > #17 0xffffffff804ba6f5 at vnlru_proc+0x4b5 > > > > addr2line on the ffs_balloc_ufs2 gives: > > /usr/src/sys/ufs/ffs/ffs_balloc.c:778: > > > > bp = getblk(vp, lbn, nsize, 0, 0, gbflags); > > bp->b_blkno = fsbtodb(fs, newb); > > if (flags & BA_CLRBUF) > > vfs_bio_clrbuf(bp); > > if (DOINGSOFTDEP(vp)) > > softdep_setup_allocdirect(ip, lbn, newb, > 0, > > nsize, 0, bp); > > > > > > Boom, freshly allocated buffer with a dependecy; nothing in VOP_RECLAIM > > handles this, this is after vinvalbuf is called, it expects that > everything > > is flushed to disk and its just about releasing structures (is my read of > > the code). > > > > Now, perhaps this is a good assumption? the question then is how is this > > buffer hanging out there surviving a a vinvalbuf. I will note that my > > test-case that finds this runs and terminates *minutes* before... its not > > just hanging out there in a race, its surviving background sync, fsync, > > etc... wtf? Also, I *can* unmount the FS without an error, so that > > codepath is either ignoring this buffer, or its forcing a sync in a way > > that doesn't panic? > Most typical cause for the buffer dependencies not flushed is a buffer > write error. At least you could provide the printout of the buffer to > confirm or reject this assumption. > > Were there any kernel messages right before the panic ? Just in case, > did you fsck the volume before using it, after the previous panic ? > > > > > Anyone have next steps? I am making progress here, but its really slow > > going, this is probably the most complex portion of the kernel and some > > pointers would be helpful. > > > > On Sat, Jul 2, 2016 at 2:31 PM, David Cross <dcrosstech at gmail.com> > wrote: > > > > > Ok, I have been trying to trace this down for awhile..I know quite a > bit > > > about it.. but there's a lot I don't know, or I would have a patch. I > have > > > been trying to solve this on my own, but bringing in some outside > > > assistance will let me move on with my life. > > > > > > First up: The stacktrace (from a debugging kernel, with coredump > > > > > > #0 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298 > > > #1 0xffffffff8071018a in kern_reboot (howto=260) > > > at /usr/src/sys/kern/kern_shutdown.c:486 > > > #2 0xffffffff80710afc in vpanic ( > > > fmt=0xffffffff80c7a325 "softdep_deallocate_dependencies: dangling > deps > > > b_ioflags: %d, b_bufsize: %ld, b_flags: %d, bo_flag: %d", > > > ap=0xfffffe023ae5cf40) > > > at /usr/src/sys/kern/kern_shutdown.c:889 > > > #3 0xffffffff807108c0 in panic ( > > > fmt=0xffffffff80c7a325 "softdep_deallocate_dependencies: dangling > deps > > > b_ioflags: %d, b_bufsize: %ld, b_flags: %d, bo_flag: %d") > > > at /usr/src/sys/kern/kern_shutdown.c:818 > > > #4 0xffffffff80a7c841 in softdep_deallocate_dependencies ( > > > bp=0xfffffe01f030e148) at /usr/src/sys/ufs/ffs/ffs_softdep.c:14099 > > > #5 0xffffffff807f793f in buf_deallocate (bp=0xfffffe01f030e148) at > > > buf.h:428 > > > #6 0xffffffff807f59c9 in brelse (bp=0xfffffe01f030e148) > > > at /usr/src/sys/kern/vfs_bio.c:1599 > > > #7 0xffffffff807f3132 in bufwrite (bp=0xfffffe01f030e148) > > > at /usr/src/sys/kern/vfs_bio.c:1180 > > > #8 0xffffffff80ab226a in bwrite (bp=0xfffffe01f030e148) at buf.h:395 > > > #9 0xffffffff80aafb1b in ffs_write (ap=0xfffffe023ae5d2b8) > > > at /usr/src/sys/ufs/ffs/ffs_vnops.c:800 > > > #10 0xffffffff80bdf0ed in VOP_WRITE_APV (vop=0xffffffff80f15480, > > > a=0xfffffe023ae5d2b8) at vnode_if.c:999 > > > #11 0xffffffff80b1d02e in VOP_WRITE (vp=0xfffff80077e7a000, > > > uio=0xfffffe023ae5d378, ioflag=8323232, cred=0xfffff80004235000) > > > at vnode_if.h:413 > > > #12 0xffffffff80b1ce97 in vnode_pager_generic_putpages > > > (vp=0xfffff80077e7a000, > > > ma=0xfffffe023ae5d660, bytecount=16384, flags=1, > > > rtvals=0xfffffe023ae5d580) > > > at /usr/src/sys/vm/vnode_pager.c:1138 > > > #13 0xffffffff80805a57 in vop_stdputpages (ap=0xfffffe023ae5d478) > > > at /usr/src/sys/kern/vfs_default.c:760 > > > #14 0xffffffff80be201e in VOP_PUTPAGES_APV (vop=0xffffffff80f00218, > > > a=0xfffffe023ae5d478) at vnode_if.c:2861 > > > #15 0xffffffff80b1d7e3 in VOP_PUTPAGES (vp=0xfffff80077e7a000, > > > m=0xfffffe023ae5d660, count=16384, sync=1, > rtvals=0xfffffe023ae5d580, > > > offset=0) at vnode_if.h:1189 > > > #16 0xffffffff80b196f3 in vnode_pager_putpages > (object=0xfffff8014a1fce00, > > > m=0xfffffe023ae5d660, count=4, flags=1, rtvals=0xfffffe023ae5d580) > > > at /usr/src/sys/vm/vnode_pager.c:1016 > > > #17 0xffffffff80b0a605 in vm_pager_put_pages > (object=0xfffff8014a1fce00, > > > m=0xfffffe023ae5d660, count=4, flags=1, rtvals=0xfffffe023ae5d580) > > > at vm_pager.h:144 > > > #18 0xffffffff80b0a18c in vm_pageout_flush (mc=0xfffffe023ae5d660, > > > count=4, > > > flags=1, mreq=0, prunlen=0xfffffe023ae5d6f8, > eio=0xfffffe023ae5d77c) > > > at /usr/src/sys/vm/vm_pageout.c:533 > > > #19 0xffffffff80afec76 in vm_object_page_collect_flush ( > > > object=0xfffff8014a1fce00, p=0xfffff8023a882370, pagerflags=1, > > > flags=1, > > > clearobjflags=0xfffffe023ae5d780, eio=0xfffffe023ae5d77c) > > > at /usr/src/sys/vm/vm_object.c:971 > > > #20 0xffffffff80afe91e in vm_object_page_clean > (object=0xfffff8014a1fce00, > > > start=0, end=0, flags=1) at /usr/src/sys/vm/vm_object.c:897 > > > #21 0xffffffff80afe1fa in vm_object_terminate > (object=0xfffff8014a1fce00) > > > at /usr/src/sys/vm/vm_object.c:735 > > > #22 0xffffffff80b1a0f1 in vnode_destroy_vobject (vp=0xfffff80077e7a000) > > > at /usr/src/sys/vm/vnode_pager.c:164 > > > #23 0xffffffff80abb191 in ufs_prepare_reclaim (vp=0xfffff80077e7a000) > > > at /usr/src/sys/ufs/ufs/ufs_inode.c:190 > > > #24 0xffffffff80abb1f9 in ufs_reclaim (ap=0xfffffe023ae5d968) > > > at /usr/src/sys/ufs/ufs/ufs_inode.c:219 > > > #25 0xffffffff80be0ade in VOP_RECLAIM_APV (vop=0xffffffff80f15ec0, > > > a=0xfffffe023ae5d968) at vnode_if.c:2019 > > > #26 0xffffffff80827849 in VOP_RECLAIM (vp=0xfffff80077e7a000, > > > td=0xfffff80008931960) at vnode_if.h:830 > > > #27 0xffffffff808219a9 in vgonel (vp=0xfffff80077e7a000) > > > at /usr/src/sys/kern/vfs_subr.c:2943 > > > #28 0xffffffff808294e8 in vlrureclaim (mp=0xfffff80008b2e000) > > > at /usr/src/sys/kern/vfs_subr.c:882 > > > #29 0xffffffff80828ea9 in vnlru_proc () at > > > /usr/src/sys/kern/vfs_subr.c:1000 > > > #30 0xffffffff806b66c5 in fork_exit (callout=0xffffffff80828c50 > > > <vnlru_proc>, > > > arg=0x0, frame=0xfffffe023ae5dc00) at > > > /usr/src/sys/kern/kern_fork.c:1027 > > > #31 0xffffffff80b21dce in fork_trampoline () > > > at /usr/src/sys/amd64/amd64/exception.S:611 > > > #32 0x0000000000000000 in ?? () > > > > > > This is a kernel compiled -O -g, its "almost" GENERIC; the only > difference > > > is some removed drivers, I have reproduced this on a few different > kernels, > > > including a BHYVE one so I can poke at it and not take out the main > > > machine. The reproduction as it currently stands needs to have jails > > > running, but I don't believe this is a jail interaction, I think its > just > > > that the process that sets up the problem happens to be running in a > jail. > > > The step is "start jail; run "find /mountpoint -xdev >/dev/null" on the > > > filesystem, when the vnlru forces the problem vnode out the system > panics. > > > > > > I made a few modifications to the kernel to spit out information about > the > > > buf that causes the issue, but that is it. > > > > > > Information about the buf in question; it has a single softdependency > > > worklist for direct allocation: > > > (kgdb) print *bp->b_dep->lh_first > > > $6 = {wk_list = {le_next = 0x0, le_prev = 0xfffffe01f030e378}, > > > wk_mp = 0xfffff80008b2e000, wk_type = 4, wk_state = 163841} > > > > > > The file that maps to that buffer: > > > ls -lh MOUNTPOINT/jails/mail/var/imap/db/__db.002 > > > -rw------- 1 cyrus cyrus 24K Jul 1 20:32 > > > MOUNTPOINT/jails/mail/var/imap/db/__db.002 > > > > > > Any help is appreciated, until then I will keep banging my head against > > > the proverbial wall on this :) > > > > > _______________________________________________ > > freebsd-stable at freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org > " >
David Cross
2016-Jul-06 16:02 UTC
Reproducable panic in FFS with softupdates and no journaling (10.3-RELEASE-pLATEST)
Oh, whoops; how do I printout the buffer? On Wed, Jul 6, 2016 at 11:30 AM, David Cross <dcrosstech at gmail.com> wrote:> No kernel messages before (if there were I would have written this off a > long time ago); > And as of right now, this is probably the most fsck-ed filesystem on the > planet!.. I have an 'image' that I am going on that is ggate mounted, so I > can access it in a bhyve VM to ease debuging so I am not crashing my real > machine (with the real filesystem) all the time. > > One of my initial guesses was that this was a CG allocation error, but a > dumpfs seems to show plenty of blocks in the CG to meet this need. > > Quick note on the testcase, I haven't totally isolated it yet, but the > minimal reproduction is a 'ctl_cyrusdb -r", which runs a bdb5 recover op, a > ktrace on that shows that it unlinks 3 files, opens them, lseeks then, > writes a block, and then mmaps them (but leaves them open). At process > termination is munmaps, and then closes. I have tried to write a shorter > reproduction that opens, seeks, mmaps (with the same flags), writes the > mmaped memory, munmaps, closes and exits, but this has been insufficient to > reproduce the issue; There is likely some specific pattern in the bdb5 code > tickling this, and behind the mmap-ed interface it is all opaque, and the > bdb5 code is pretty complex itself > > On Wed, Jul 6, 2016 at 11:18 AM, Konstantin Belousov <kostikbel at gmail.com> > wrote: > >> On Wed, Jul 06, 2016 at 10:51:28AM -0400, David Cross wrote: >> > Ok.. to reply to my own message, I using ktr and debugging printfs I >> have >> > found the culprit.. but I am still at a loss to 'why', or what the >> > appropriate fix is. >> > >> > Lets go back to the panic (simplified) >> > >> > #0 0xffffffff8043f160 at kdb_backtrace+0x60 >> > #1 0xffffffff80401454 at vpanic+0x124 >> > #2 0xffffffff804014e3 at panic+0x43 >> > #3 0xffffffff8060719a at softdep_deallocate_dependencies+0x6a >> > #4 0xffffffff80499cc1 at brelse+0x151 >> > #5 0xffffffff804979b1 at bufwrite+0x81 >> > #6 0xffffffff80623c80 at ffs_write+0x4b0 >> > #7 0xffffffff806ce9a4 at VOP_WRITE_APV+0x1c4 >> > #8 0xffffffff806639e3 at vnode_pager_generic_putpages+0x293 >> > #9 0xffffffff806d2102 at VOP_PUTPAGES_APV+0x142 >> > #10 0xffffffff80661cc1 at vnode_pager_putpages+0x91 >> > #11 0xffffffff806588e6 at vm_pageout_flush+0x116 >> > #12 0xffffffff806517e2 at vm_object_page_collect_flush+0x1c2 >> > #13 0xffffffff80651519 at vm_object_page_clean+0x179 >> > #14 0xffffffff80651102 at vm_object_terminate+0xa2 >> > #15 0xffffffff806621a5 at vnode_destroy_vobject+0x85 >> > #16 0xffffffff8062a52f at ufs_reclaim+0x1f >> > #17 0xffffffff806d0782 at VOP_RECLAIM_APV+0x142 >> > >> > Via KTR logging I determined that the dangling dependedency was on a >> > freshly allocated buf, *after* vinvalbuf in the vgonel() (so in >> VOP_RECLAIM >> > itself), called by the vnode lru cleanup process; I further noticed >> that it >> > was in a newbuf that recycled a bp (unimportant, except it let me narrow >> > down my logging to something managable), from there I get this >> stacktrace >> > (simplified) >> > >> > #0 0xffffffff8043f160 at kdb_backtrace+0x60 >> > #1 0xffffffff8049c98e at getnewbuf+0x4be >> > #2 0xffffffff804996a0 at getblk+0x830 >> > #3 0xffffffff805fb207 at ffs_balloc_ufs2+0x1327 >> > #4 0xffffffff80623b0b at ffs_write+0x33b >> > #5 0xffffffff806ce9a4 at VOP_WRITE_APV+0x1c4 >> > #6 0xffffffff806639e3 at vnode_pager_generic_putpages+0x293 >> > #7 0xffffffff806d2102 at VOP_PUTPAGES_APV+0x142 >> > #8 0xffffffff80661cc1 at vnode_pager_putpages+0x91 >> > #9 0xffffffff806588e6 at vm_pageout_flush+0x116 >> > #10 0xffffffff806517e2 at vm_object_page_collect_flush+0x1c2 >> > #11 0xffffffff80651519 at vm_object_page_clean+0x179 >> > #12 0xffffffff80651102 at vm_object_terminate+0xa2 >> > #13 0xffffffff806621a5 at vnode_destroy_vobject+0x85 >> > #14 0xffffffff8062a52f at ufs_reclaim+0x1f >> > #15 0xffffffff806d0782 at VOP_RECLAIM_APV+0x142 >> > #16 0xffffffff804b6c6e at vgonel+0x2ee >> > #17 0xffffffff804ba6f5 at vnlru_proc+0x4b5 >> > >> > addr2line on the ffs_balloc_ufs2 gives: >> > /usr/src/sys/ufs/ffs/ffs_balloc.c:778: >> > >> > bp = getblk(vp, lbn, nsize, 0, 0, gbflags); >> > bp->b_blkno = fsbtodb(fs, newb); >> > if (flags & BA_CLRBUF) >> > vfs_bio_clrbuf(bp); >> > if (DOINGSOFTDEP(vp)) >> > softdep_setup_allocdirect(ip, lbn, >> newb, 0, >> > nsize, 0, bp); >> > >> > >> > Boom, freshly allocated buffer with a dependecy; nothing in VOP_RECLAIM >> > handles this, this is after vinvalbuf is called, it expects that >> everything >> > is flushed to disk and its just about releasing structures (is my read >> of >> > the code). >> > >> > Now, perhaps this is a good assumption? the question then is how is >> this >> > buffer hanging out there surviving a a vinvalbuf. I will note that my >> > test-case that finds this runs and terminates *minutes* before... its >> not >> > just hanging out there in a race, its surviving background sync, fsync, >> > etc... wtf? Also, I *can* unmount the FS without an error, so that >> > codepath is either ignoring this buffer, or its forcing a sync in a way >> > that doesn't panic? >> Most typical cause for the buffer dependencies not flushed is a buffer >> write error. At least you could provide the printout of the buffer to >> confirm or reject this assumption. >> >> Were there any kernel messages right before the panic ? Just in case, >> did you fsck the volume before using it, after the previous panic ? >> >> > >> > Anyone have next steps? I am making progress here, but its really slow >> > going, this is probably the most complex portion of the kernel and some >> > pointers would be helpful. >> > >> > On Sat, Jul 2, 2016 at 2:31 PM, David Cross <dcrosstech at gmail.com> >> wrote: >> > >> > > Ok, I have been trying to trace this down for awhile..I know quite a >> bit >> > > about it.. but there's a lot I don't know, or I would have a patch. >> I have >> > > been trying to solve this on my own, but bringing in some outside >> > > assistance will let me move on with my life. >> > > >> > > First up: The stacktrace (from a debugging kernel, with coredump >> > > >> > > #0 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298 >> > > #1 0xffffffff8071018a in kern_reboot (howto=260) >> > > at /usr/src/sys/kern/kern_shutdown.c:486 >> > > #2 0xffffffff80710afc in vpanic ( >> > > fmt=0xffffffff80c7a325 "softdep_deallocate_dependencies: dangling >> deps >> > > b_ioflags: %d, b_bufsize: %ld, b_flags: %d, bo_flag: %d", >> > > ap=0xfffffe023ae5cf40) >> > > at /usr/src/sys/kern/kern_shutdown.c:889 >> > > #3 0xffffffff807108c0 in panic ( >> > > fmt=0xffffffff80c7a325 "softdep_deallocate_dependencies: dangling >> deps >> > > b_ioflags: %d, b_bufsize: %ld, b_flags: %d, bo_flag: %d") >> > > at /usr/src/sys/kern/kern_shutdown.c:818 >> > > #4 0xffffffff80a7c841 in softdep_deallocate_dependencies ( >> > > bp=0xfffffe01f030e148) at /usr/src/sys/ufs/ffs/ffs_softdep.c:14099 >> > > #5 0xffffffff807f793f in buf_deallocate (bp=0xfffffe01f030e148) at >> > > buf.h:428 >> > > #6 0xffffffff807f59c9 in brelse (bp=0xfffffe01f030e148) >> > > at /usr/src/sys/kern/vfs_bio.c:1599 >> > > #7 0xffffffff807f3132 in bufwrite (bp=0xfffffe01f030e148) >> > > at /usr/src/sys/kern/vfs_bio.c:1180 >> > > #8 0xffffffff80ab226a in bwrite (bp=0xfffffe01f030e148) at buf.h:395 >> > > #9 0xffffffff80aafb1b in ffs_write (ap=0xfffffe023ae5d2b8) >> > > at /usr/src/sys/ufs/ffs/ffs_vnops.c:800 >> > > #10 0xffffffff80bdf0ed in VOP_WRITE_APV (vop=0xffffffff80f15480, >> > > a=0xfffffe023ae5d2b8) at vnode_if.c:999 >> > > #11 0xffffffff80b1d02e in VOP_WRITE (vp=0xfffff80077e7a000, >> > > uio=0xfffffe023ae5d378, ioflag=8323232, cred=0xfffff80004235000) >> > > at vnode_if.h:413 >> > > #12 0xffffffff80b1ce97 in vnode_pager_generic_putpages >> > > (vp=0xfffff80077e7a000, >> > > ma=0xfffffe023ae5d660, bytecount=16384, flags=1, >> > > rtvals=0xfffffe023ae5d580) >> > > at /usr/src/sys/vm/vnode_pager.c:1138 >> > > #13 0xffffffff80805a57 in vop_stdputpages (ap=0xfffffe023ae5d478) >> > > at /usr/src/sys/kern/vfs_default.c:760 >> > > #14 0xffffffff80be201e in VOP_PUTPAGES_APV (vop=0xffffffff80f00218, >> > > a=0xfffffe023ae5d478) at vnode_if.c:2861 >> > > #15 0xffffffff80b1d7e3 in VOP_PUTPAGES (vp=0xfffff80077e7a000, >> > > m=0xfffffe023ae5d660, count=16384, sync=1, >> rtvals=0xfffffe023ae5d580, >> > > offset=0) at vnode_if.h:1189 >> > > #16 0xffffffff80b196f3 in vnode_pager_putpages >> (object=0xfffff8014a1fce00, >> > > m=0xfffffe023ae5d660, count=4, flags=1, rtvals=0xfffffe023ae5d580) >> > > at /usr/src/sys/vm/vnode_pager.c:1016 >> > > #17 0xffffffff80b0a605 in vm_pager_put_pages >> (object=0xfffff8014a1fce00, >> > > m=0xfffffe023ae5d660, count=4, flags=1, rtvals=0xfffffe023ae5d580) >> > > at vm_pager.h:144 >> > > #18 0xffffffff80b0a18c in vm_pageout_flush (mc=0xfffffe023ae5d660, >> > > count=4, >> > > flags=1, mreq=0, prunlen=0xfffffe023ae5d6f8, >> eio=0xfffffe023ae5d77c) >> > > at /usr/src/sys/vm/vm_pageout.c:533 >> > > #19 0xffffffff80afec76 in vm_object_page_collect_flush ( >> > > object=0xfffff8014a1fce00, p=0xfffff8023a882370, pagerflags=1, >> > > flags=1, >> > > clearobjflags=0xfffffe023ae5d780, eio=0xfffffe023ae5d77c) >> > > at /usr/src/sys/vm/vm_object.c:971 >> > > #20 0xffffffff80afe91e in vm_object_page_clean >> (object=0xfffff8014a1fce00, >> > > start=0, end=0, flags=1) at /usr/src/sys/vm/vm_object.c:897 >> > > #21 0xffffffff80afe1fa in vm_object_terminate >> (object=0xfffff8014a1fce00) >> > > at /usr/src/sys/vm/vm_object.c:735 >> > > #22 0xffffffff80b1a0f1 in vnode_destroy_vobject >> (vp=0xfffff80077e7a000) >> > > at /usr/src/sys/vm/vnode_pager.c:164 >> > > #23 0xffffffff80abb191 in ufs_prepare_reclaim (vp=0xfffff80077e7a000) >> > > at /usr/src/sys/ufs/ufs/ufs_inode.c:190 >> > > #24 0xffffffff80abb1f9 in ufs_reclaim (ap=0xfffffe023ae5d968) >> > > at /usr/src/sys/ufs/ufs/ufs_inode.c:219 >> > > #25 0xffffffff80be0ade in VOP_RECLAIM_APV (vop=0xffffffff80f15ec0, >> > > a=0xfffffe023ae5d968) at vnode_if.c:2019 >> > > #26 0xffffffff80827849 in VOP_RECLAIM (vp=0xfffff80077e7a000, >> > > td=0xfffff80008931960) at vnode_if.h:830 >> > > #27 0xffffffff808219a9 in vgonel (vp=0xfffff80077e7a000) >> > > at /usr/src/sys/kern/vfs_subr.c:2943 >> > > #28 0xffffffff808294e8 in vlrureclaim (mp=0xfffff80008b2e000) >> > > at /usr/src/sys/kern/vfs_subr.c:882 >> > > #29 0xffffffff80828ea9 in vnlru_proc () at >> > > /usr/src/sys/kern/vfs_subr.c:1000 >> > > #30 0xffffffff806b66c5 in fork_exit (callout=0xffffffff80828c50 >> > > <vnlru_proc>, >> > > arg=0x0, frame=0xfffffe023ae5dc00) at >> > > /usr/src/sys/kern/kern_fork.c:1027 >> > > #31 0xffffffff80b21dce in fork_trampoline () >> > > at /usr/src/sys/amd64/amd64/exception.S:611 >> > > #32 0x0000000000000000 in ?? () >> > > >> > > This is a kernel compiled -O -g, its "almost" GENERIC; the only >> difference >> > > is some removed drivers, I have reproduced this on a few different >> kernels, >> > > including a BHYVE one so I can poke at it and not take out the main >> > > machine. The reproduction as it currently stands needs to have jails >> > > running, but I don't believe this is a jail interaction, I think its >> just >> > > that the process that sets up the problem happens to be running in a >> jail. >> > > The step is "start jail; run "find /mountpoint -xdev >/dev/null" on >> the >> > > filesystem, when the vnlru forces the problem vnode out the system >> panics. >> > > >> > > I made a few modifications to the kernel to spit out information >> about the >> > > buf that causes the issue, but that is it. >> > > >> > > Information about the buf in question; it has a single softdependency >> > > worklist for direct allocation: >> > > (kgdb) print *bp->b_dep->lh_first >> > > $6 = {wk_list = {le_next = 0x0, le_prev = 0xfffffe01f030e378}, >> > > wk_mp = 0xfffff80008b2e000, wk_type = 4, wk_state = 163841} >> > > >> > > The file that maps to that buffer: >> > > ls -lh MOUNTPOINT/jails/mail/var/imap/db/__db.002 >> > > -rw------- 1 cyrus cyrus 24K Jul 1 20:32 >> > > MOUNTPOINT/jails/mail/var/imap/db/__db.002 >> > > >> > > Any help is appreciated, until then I will keep banging my head >> against >> > > the proverbial wall on this :) >> > > >> > _______________________________________________ >> > freebsd-stable at freebsd.org mailing list >> > https://lists.freebsd.org/mailman/listinfo/freebsd-stable >> > To unsubscribe, send any mail to " >> freebsd-stable-unsubscribe at freebsd.org" >> > >