thr3ads.net - freebsd stable - Reproducable panic in FFS with softupdates and no journaling (10.3-RELEASE-pLATEST) [Jul 2016]

If this information is useful, please help other people find it:
Share via:

David Cross

2016-Jul-06 15:30 UTC

Reproducable panic in FFS with softupdates and no journaling (10.3-RELEASE-pLATEST)

No kernel messages before (if there were I would have written this off a
long time ago);
And as of right now, this is probably the most fsck-ed filesystem on the
planet!.. I have an 'image' that I am going on that is ggate mounted, so
I
can access it in a bhyve VM to ease debuging so I am not crashing my real
machine (with the real filesystem) all the time.

One of my initial guesses was that this was a CG allocation error, but a
dumpfs seems to show plenty of blocks in the CG to meet this need.

Quick note on the testcase, I haven't totally isolated it yet, but the
minimal reproduction is a 'ctl_cyrusdb -r", which runs a bdb5 recover
op, a
ktrace on that shows that it unlinks 3 files, opens them, lseeks then,
writes a block, and then mmaps them (but leaves them open).  At process
termination is munmaps, and then closes.  I have tried to write a shorter
reproduction that opens, seeks, mmaps (with the same flags), writes the
mmaped memory, munmaps, closes and exits, but this has been insufficient to
reproduce the issue; There is likely some specific pattern in the bdb5 code
tickling this, and behind the mmap-ed interface it is all opaque, and the
bdb5 code is pretty complex itself

On Wed, Jul 6, 2016 at 11:18 AM, Konstantin Belousov <kostikbel at
gmail.com>
wrote:
> On Wed, Jul 06, 2016 at 10:51:28AM -0400, David Cross wrote:
> > Ok.. to reply to my own message, I using ktr and debugging printfs I
have
> > found the culprit.. but I am still at a loss to 'why', or what
the
> > appropriate fix is.
> >
> > Lets go back to the panic (simplified)
> >
> > #0 0xffffffff8043f160 at kdb_backtrace+0x60
> > #1 0xffffffff80401454 at vpanic+0x124
> > #2 0xffffffff804014e3 at panic+0x43
> > #3 0xffffffff8060719a at softdep_deallocate_dependencies+0x6a
> > #4 0xffffffff80499cc1 at brelse+0x151
> > #5 0xffffffff804979b1 at bufwrite+0x81
> > #6 0xffffffff80623c80 at ffs_write+0x4b0
> > #7 0xffffffff806ce9a4 at VOP_WRITE_APV+0x1c4
> > #8 0xffffffff806639e3 at vnode_pager_generic_putpages+0x293
> > #9 0xffffffff806d2102 at VOP_PUTPAGES_APV+0x142
> > #10 0xffffffff80661cc1 at vnode_pager_putpages+0x91
> > #11 0xffffffff806588e6 at vm_pageout_flush+0x116
> > #12 0xffffffff806517e2 at vm_object_page_collect_flush+0x1c2
> > #13 0xffffffff80651519 at vm_object_page_clean+0x179
> > #14 0xffffffff80651102 at vm_object_terminate+0xa2
> > #15 0xffffffff806621a5 at vnode_destroy_vobject+0x85
> > #16 0xffffffff8062a52f at ufs_reclaim+0x1f
> > #17 0xffffffff806d0782 at VOP_RECLAIM_APV+0x142
> >
> > Via KTR logging I determined that the dangling dependedency was on a
> > freshly allocated buf, *after* vinvalbuf in the vgonel() (so in
> VOP_RECLAIM
> > itself), called by the vnode lru cleanup process; I further noticed
that
> it
> > was in a newbuf that recycled a bp (unimportant, except it let me
narrow
> > down my logging to something managable), from there I get this
stacktrace
> > (simplified)
> >
> > #0 0xffffffff8043f160 at kdb_backtrace+0x60
> > #1 0xffffffff8049c98e at getnewbuf+0x4be
> > #2 0xffffffff804996a0 at getblk+0x830
> > #3 0xffffffff805fb207 at ffs_balloc_ufs2+0x1327
> > #4 0xffffffff80623b0b at ffs_write+0x33b
> > #5 0xffffffff806ce9a4 at VOP_WRITE_APV+0x1c4
> > #6 0xffffffff806639e3 at vnode_pager_generic_putpages+0x293
> > #7 0xffffffff806d2102 at VOP_PUTPAGES_APV+0x142
> > #8 0xffffffff80661cc1 at vnode_pager_putpages+0x91
> > #9 0xffffffff806588e6 at vm_pageout_flush+0x116
> > #10 0xffffffff806517e2 at vm_object_page_collect_flush+0x1c2
> > #11 0xffffffff80651519 at vm_object_page_clean+0x179
> > #12 0xffffffff80651102 at vm_object_terminate+0xa2
> > #13 0xffffffff806621a5 at vnode_destroy_vobject+0x85
> > #14 0xffffffff8062a52f at ufs_reclaim+0x1f
> > #15 0xffffffff806d0782 at VOP_RECLAIM_APV+0x142
> > #16 0xffffffff804b6c6e at vgonel+0x2ee
> > #17 0xffffffff804ba6f5 at vnlru_proc+0x4b5
> >
> > addr2line on the ffs_balloc_ufs2 gives:
> > /usr/src/sys/ufs/ffs/ffs_balloc.c:778:
> >
> >                         bp = getblk(vp, lbn, nsize, 0, 0, gbflags);
> >                         bp->b_blkno = fsbtodb(fs, newb);
> >                         if (flags & BA_CLRBUF)
> >                                 vfs_bio_clrbuf(bp);
> >                         if (DOINGSOFTDEP(vp))
> >                                 softdep_setup_allocdirect(ip, lbn,
newb,
> 0,
> >                                     nsize, 0, bp);
> >
> >
> > Boom, freshly allocated buffer with a dependecy; nothing in
VOP_RECLAIM
> > handles this, this is after vinvalbuf is called, it expects that
> everything
> > is flushed to disk and its just about releasing structures (is my read
of
> > the code).
> >
> > Now, perhaps this is a good assumption?  the question then is how is
this
> > buffer hanging out there surviving a a vinvalbuf.  I will note that my
> > test-case that finds this runs and terminates *minutes* before... its
not
> > just hanging out there in a race, its surviving background sync,
fsync,
> > etc... wtf?  Also, I *can* unmount the FS without an error, so that
> > codepath is either ignoring this buffer, or its forcing a sync in a
way
> > that doesn't panic?
> Most typical cause for the buffer dependencies not flushed is a buffer
> write error.  At least you could provide the printout of the buffer to
> confirm or reject this assumption.
>
> Were there any kernel messages right before the panic ?  Just in case,
> did you fsck the volume before using it, after the previous panic ?
>
> >
> > Anyone have next steps?  I am making progress here, but its really
slow
> > going, this is probably the most complex portion of the kernel and
some
> > pointers would be helpful.
> >
> > On Sat, Jul 2, 2016 at 2:31 PM, David Cross <dcrosstech at
gmail.com>
> wrote:
> >
> > > Ok, I have been trying to trace this down for awhile..I know
quite a
> bit
> > > about it.. but there's a lot I don't know, or I would
have a patch.  I
> have
> > > been trying to solve this on my own, but bringing in some outside
> > > assistance will let me move on with my life.
> > >
> > > First up:  The stacktrace (from a debugging kernel, with coredump
> > >
> > > #0  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298
> > > #1  0xffffffff8071018a in kern_reboot (howto=260)
> > >     at /usr/src/sys/kern/kern_shutdown.c:486
> > > #2  0xffffffff80710afc in vpanic (
> > >     fmt=0xffffffff80c7a325 "softdep_deallocate_dependencies:
dangling
> deps
> > > b_ioflags: %d, b_bufsize: %ld, b_flags: %d, bo_flag: %d",
> > > ap=0xfffffe023ae5cf40)
> > >     at /usr/src/sys/kern/kern_shutdown.c:889
> > > #3  0xffffffff807108c0 in panic (
> > >     fmt=0xffffffff80c7a325 "softdep_deallocate_dependencies:
dangling
> deps
> > > b_ioflags: %d, b_bufsize: %ld, b_flags: %d, bo_flag: %d")
> > >     at /usr/src/sys/kern/kern_shutdown.c:818
> > > #4  0xffffffff80a7c841 in softdep_deallocate_dependencies (
> > >     bp=0xfffffe01f030e148) at
/usr/src/sys/ufs/ffs/ffs_softdep.c:14099
> > > #5  0xffffffff807f793f in buf_deallocate (bp=0xfffffe01f030e148)
at
> > > buf.h:428
> > > #6  0xffffffff807f59c9 in brelse (bp=0xfffffe01f030e148)
> > >     at /usr/src/sys/kern/vfs_bio.c:1599
> > > #7  0xffffffff807f3132 in bufwrite (bp=0xfffffe01f030e148)
> > >     at /usr/src/sys/kern/vfs_bio.c:1180
> > > #8  0xffffffff80ab226a in bwrite (bp=0xfffffe01f030e148) at
buf.h:395
> > > #9  0xffffffff80aafb1b in ffs_write (ap=0xfffffe023ae5d2b8)
> > >     at /usr/src/sys/ufs/ffs/ffs_vnops.c:800
> > > #10 0xffffffff80bdf0ed in VOP_WRITE_APV (vop=0xffffffff80f15480,
> > >     a=0xfffffe023ae5d2b8) at vnode_if.c:999
> > > #11 0xffffffff80b1d02e in VOP_WRITE (vp=0xfffff80077e7a000,
> > >     uio=0xfffffe023ae5d378, ioflag=8323232,
cred=0xfffff80004235000)
> > >     at vnode_if.h:413
> > > #12 0xffffffff80b1ce97 in vnode_pager_generic_putpages
> > > (vp=0xfffff80077e7a000,
> > >     ma=0xfffffe023ae5d660, bytecount=16384, flags=1,
> > > rtvals=0xfffffe023ae5d580)
> > >     at /usr/src/sys/vm/vnode_pager.c:1138
> > > #13 0xffffffff80805a57 in vop_stdputpages (ap=0xfffffe023ae5d478)
> > >     at /usr/src/sys/kern/vfs_default.c:760
> > > #14 0xffffffff80be201e in VOP_PUTPAGES_APV
(vop=0xffffffff80f00218,
> > >     a=0xfffffe023ae5d478) at vnode_if.c:2861
> > > #15 0xffffffff80b1d7e3 in VOP_PUTPAGES (vp=0xfffff80077e7a000,
> > >     m=0xfffffe023ae5d660, count=16384, sync=1,
> rtvals=0xfffffe023ae5d580,
> > >     offset=0) at vnode_if.h:1189
> > > #16 0xffffffff80b196f3 in vnode_pager_putpages
> (object=0xfffff8014a1fce00,
> > >     m=0xfffffe023ae5d660, count=4, flags=1,
rtvals=0xfffffe023ae5d580)
> > >     at /usr/src/sys/vm/vnode_pager.c:1016
> > > #17 0xffffffff80b0a605 in vm_pager_put_pages
> (object=0xfffff8014a1fce00,
> > >     m=0xfffffe023ae5d660, count=4, flags=1,
rtvals=0xfffffe023ae5d580)
> > >     at vm_pager.h:144
> > > #18 0xffffffff80b0a18c in vm_pageout_flush
(mc=0xfffffe023ae5d660,
> > > count=4,
> > >     flags=1, mreq=0, prunlen=0xfffffe023ae5d6f8,
> eio=0xfffffe023ae5d77c)
> > >     at /usr/src/sys/vm/vm_pageout.c:533
> > > #19 0xffffffff80afec76 in vm_object_page_collect_flush (
> > >     object=0xfffff8014a1fce00, p=0xfffff8023a882370,
pagerflags=1,
> > > flags=1,
> > >     clearobjflags=0xfffffe023ae5d780, eio=0xfffffe023ae5d77c)
> > >     at /usr/src/sys/vm/vm_object.c:971
> > > #20 0xffffffff80afe91e in vm_object_page_clean
> (object=0xfffff8014a1fce00,
> > >     start=0, end=0, flags=1) at /usr/src/sys/vm/vm_object.c:897
> > > #21 0xffffffff80afe1fa in vm_object_terminate
> (object=0xfffff8014a1fce00)
> > >     at /usr/src/sys/vm/vm_object.c:735
> > > #22 0xffffffff80b1a0f1 in vnode_destroy_vobject
(vp=0xfffff80077e7a000)
> > >     at /usr/src/sys/vm/vnode_pager.c:164
> > > #23 0xffffffff80abb191 in ufs_prepare_reclaim
(vp=0xfffff80077e7a000)
> > >     at /usr/src/sys/ufs/ufs/ufs_inode.c:190
> > > #24 0xffffffff80abb1f9 in ufs_reclaim (ap=0xfffffe023ae5d968)
> > >     at /usr/src/sys/ufs/ufs/ufs_inode.c:219
> > > #25 0xffffffff80be0ade in VOP_RECLAIM_APV
(vop=0xffffffff80f15ec0,
> > >     a=0xfffffe023ae5d968) at vnode_if.c:2019
> > > #26 0xffffffff80827849 in VOP_RECLAIM (vp=0xfffff80077e7a000,
> > >     td=0xfffff80008931960) at vnode_if.h:830
> > > #27 0xffffffff808219a9 in vgonel (vp=0xfffff80077e7a000)
> > >     at /usr/src/sys/kern/vfs_subr.c:2943
> > > #28 0xffffffff808294e8 in vlrureclaim (mp=0xfffff80008b2e000)
> > >     at /usr/src/sys/kern/vfs_subr.c:882
> > > #29 0xffffffff80828ea9 in vnlru_proc () at
> > > /usr/src/sys/kern/vfs_subr.c:1000
> > > #30 0xffffffff806b66c5 in fork_exit (callout=0xffffffff80828c50
> > > <vnlru_proc>,
> > >     arg=0x0, frame=0xfffffe023ae5dc00) at
> > > /usr/src/sys/kern/kern_fork.c:1027
> > > #31 0xffffffff80b21dce in fork_trampoline ()
> > >     at /usr/src/sys/amd64/amd64/exception.S:611
> > > #32 0x0000000000000000 in ?? ()
> > >
> > > This is a kernel compiled -O -g, its "almost" GENERIC;
the only
> difference
> > > is some removed drivers, I have reproduced this on a few
different
> kernels,
> > > including a BHYVE one so I can poke at it and not take out the
main
> > > machine.  The reproduction as it currently stands needs to have
jails
> > > running, but I don't believe this is a jail interaction, I
think its
> just
> > > that the process that sets up the problem happens to be running
in a
> jail.
> > > The step is "start jail; run "find /mountpoint -xdev
>/dev/null" on the
> > > filesystem, when the vnlru forces the problem vnode out the
system
> panics.
> > >
> > > I made a few modifications to the kernel to spit out information
about
> the
> > > buf that causes the issue, but that is it.
> > >
> > > Information about the buf in question; it has a single
softdependency
> > > worklist for direct allocation:
> > > (kgdb) print *bp->b_dep->lh_first
> > > $6 = {wk_list = {le_next = 0x0, le_prev = 0xfffffe01f030e378},
> > >   wk_mp = 0xfffff80008b2e000, wk_type = 4, wk_state = 163841}
> > >
> > > The file that maps to that buffer:
> > > ls -lh MOUNTPOINT/jails/mail/var/imap/db/__db.002
> > > -rw-------  1 cyrus  cyrus    24K Jul  1 20:32
> > > MOUNTPOINT/jails/mail/var/imap/db/__db.002
> > >
> > > Any help is appreciated, until then I will keep banging my head
against
> > > the proverbial wall on this :)
> > >
> > _______________________________________________
> > freebsd-stable at freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at
freebsd.org
> "
>

David Cross

2016-Jul-06 16:02 UTC

head link

Reproducable panic in FFS with softupdates and no journaling (10.3-RELEASE-pLATEST)

Oh, whoops; how do I printout the buffer?

On Wed, Jul 6, 2016 at 11:30 AM, David Cross <dcrosstech at gmail.com>
wrote:
> No kernel messages before (if there were I would have written this off a
> long time ago);
> And as of right now, this is probably the most fsck-ed filesystem on the
> planet!.. I have an 'image' that I am going on that is ggate
mounted, so I
> can access it in a bhyve VM to ease debuging so I am not crashing my real
> machine (with the real filesystem) all the time.
>
> One of my initial guesses was that this was a CG allocation error, but a
> dumpfs seems to show plenty of blocks in the CG to meet this need.
>
> Quick note on the testcase, I haven't totally isolated it yet, but the
> minimal reproduction is a 'ctl_cyrusdb -r", which runs a bdb5
recover op, a
> ktrace on that shows that it unlinks 3 files, opens them, lseeks then,
> writes a block, and then mmaps them (but leaves them open).  At process
> termination is munmaps, and then closes.  I have tried to write a shorter
> reproduction that opens, seeks, mmaps (with the same flags), writes the
> mmaped memory, munmaps, closes and exits, but this has been insufficient to
> reproduce the issue; There is likely some specific pattern in the bdb5 code
> tickling this, and behind the mmap-ed interface it is all opaque, and the
> bdb5 code is pretty complex itself
>
> On Wed, Jul 6, 2016 at 11:18 AM, Konstantin Belousov <kostikbel at
gmail.com>
> wrote:
>
>> On Wed, Jul 06, 2016 at 10:51:28AM -0400, David Cross wrote:
>> > Ok.. to reply to my own message, I using ktr and debugging printfs
I
>> have
>> > found the culprit.. but I am still at a loss to 'why', or
what the
>> > appropriate fix is.
>> >
>> > Lets go back to the panic (simplified)
>> >
>> > #0 0xffffffff8043f160 at kdb_backtrace+0x60
>> > #1 0xffffffff80401454 at vpanic+0x124
>> > #2 0xffffffff804014e3 at panic+0x43
>> > #3 0xffffffff8060719a at softdep_deallocate_dependencies+0x6a
>> > #4 0xffffffff80499cc1 at brelse+0x151
>> > #5 0xffffffff804979b1 at bufwrite+0x81
>> > #6 0xffffffff80623c80 at ffs_write+0x4b0
>> > #7 0xffffffff806ce9a4 at VOP_WRITE_APV+0x1c4
>> > #8 0xffffffff806639e3 at vnode_pager_generic_putpages+0x293
>> > #9 0xffffffff806d2102 at VOP_PUTPAGES_APV+0x142
>> > #10 0xffffffff80661cc1 at vnode_pager_putpages+0x91
>> > #11 0xffffffff806588e6 at vm_pageout_flush+0x116
>> > #12 0xffffffff806517e2 at vm_object_page_collect_flush+0x1c2
>> > #13 0xffffffff80651519 at vm_object_page_clean+0x179
>> > #14 0xffffffff80651102 at vm_object_terminate+0xa2
>> > #15 0xffffffff806621a5 at vnode_destroy_vobject+0x85
>> > #16 0xffffffff8062a52f at ufs_reclaim+0x1f
>> > #17 0xffffffff806d0782 at VOP_RECLAIM_APV+0x142
>> >
>> > Via KTR logging I determined that the dangling dependedency was on
a
>> > freshly allocated buf, *after* vinvalbuf in the vgonel() (so in
>> VOP_RECLAIM
>> > itself), called by the vnode lru cleanup process; I further
noticed
>> that it
>> > was in a newbuf that recycled a bp (unimportant, except it let me
narrow
>> > down my logging to something managable), from there I get this
>> stacktrace
>> > (simplified)
>> >
>> > #0 0xffffffff8043f160 at kdb_backtrace+0x60
>> > #1 0xffffffff8049c98e at getnewbuf+0x4be
>> > #2 0xffffffff804996a0 at getblk+0x830
>> > #3 0xffffffff805fb207 at ffs_balloc_ufs2+0x1327
>> > #4 0xffffffff80623b0b at ffs_write+0x33b
>> > #5 0xffffffff806ce9a4 at VOP_WRITE_APV+0x1c4
>> > #6 0xffffffff806639e3 at vnode_pager_generic_putpages+0x293
>> > #7 0xffffffff806d2102 at VOP_PUTPAGES_APV+0x142
>> > #8 0xffffffff80661cc1 at vnode_pager_putpages+0x91
>> > #9 0xffffffff806588e6 at vm_pageout_flush+0x116
>> > #10 0xffffffff806517e2 at vm_object_page_collect_flush+0x1c2
>> > #11 0xffffffff80651519 at vm_object_page_clean+0x179
>> > #12 0xffffffff80651102 at vm_object_terminate+0xa2
>> > #13 0xffffffff806621a5 at vnode_destroy_vobject+0x85
>> > #14 0xffffffff8062a52f at ufs_reclaim+0x1f
>> > #15 0xffffffff806d0782 at VOP_RECLAIM_APV+0x142
>> > #16 0xffffffff804b6c6e at vgonel+0x2ee
>> > #17 0xffffffff804ba6f5 at vnlru_proc+0x4b5
>> >
>> > addr2line on the ffs_balloc_ufs2 gives:
>> > /usr/src/sys/ufs/ffs/ffs_balloc.c:778:
>> >
>> >                         bp = getblk(vp, lbn, nsize, 0, 0,
gbflags);
>> >                         bp->b_blkno = fsbtodb(fs, newb);
>> >                         if (flags & BA_CLRBUF)
>> >                                 vfs_bio_clrbuf(bp);
>> >                         if (DOINGSOFTDEP(vp))
>> >                                 softdep_setup_allocdirect(ip, lbn,
>> newb, 0,
>> >                                     nsize, 0, bp);
>> >
>> >
>> > Boom, freshly allocated buffer with a dependecy; nothing in
VOP_RECLAIM
>> > handles this, this is after vinvalbuf is called, it expects that
>> everything
>> > is flushed to disk and its just about releasing structures (is my
read
>> of
>> > the code).
>> >
>> > Now, perhaps this is a good assumption?  the question then is how
is
>> this
>> > buffer hanging out there surviving a a vinvalbuf.  I will note
that my
>> > test-case that finds this runs and terminates *minutes* before...
its
>> not
>> > just hanging out there in a race, its surviving background sync,
fsync,
>> > etc... wtf?  Also, I *can* unmount the FS without an error, so
that
>> > codepath is either ignoring this buffer, or its forcing a sync in
a way
>> > that doesn't panic?
>> Most typical cause for the buffer dependencies not flushed is a buffer
>> write error.  At least you could provide the printout of the buffer to
>> confirm or reject this assumption.
>>
>> Were there any kernel messages right before the panic ?  Just in case,
>> did you fsck the volume before using it, after the previous panic ?
>>
>> >
>> > Anyone have next steps?  I am making progress here, but its really
slow
>> > going, this is probably the most complex portion of the kernel and
some
>> > pointers would be helpful.
>> >
>> > On Sat, Jul 2, 2016 at 2:31 PM, David Cross <dcrosstech at
gmail.com>
>> wrote:
>> >
>> > > Ok, I have been trying to trace this down for awhile..I know
quite a
>> bit
>> > > about it.. but there's a lot I don't know, or I would
have a patch.
>> I have
>> > > been trying to solve this on my own, but bringing in some
outside
>> > > assistance will let me move on with my life.
>> > >
>> > > First up:  The stacktrace (from a debugging kernel, with
coredump
>> > >
>> > > #0  doadump (textdump=1) at
/usr/src/sys/kern/kern_shutdown.c:298
>> > > #1  0xffffffff8071018a in kern_reboot (howto=260)
>> > >     at /usr/src/sys/kern/kern_shutdown.c:486
>> > > #2  0xffffffff80710afc in vpanic (
>> > >     fmt=0xffffffff80c7a325
"softdep_deallocate_dependencies: dangling
>> deps
>> > > b_ioflags: %d, b_bufsize: %ld, b_flags: %d, bo_flag:
%d",
>> > > ap=0xfffffe023ae5cf40)
>> > >     at /usr/src/sys/kern/kern_shutdown.c:889
>> > > #3  0xffffffff807108c0 in panic (
>> > >     fmt=0xffffffff80c7a325
"softdep_deallocate_dependencies: dangling
>> deps
>> > > b_ioflags: %d, b_bufsize: %ld, b_flags: %d, bo_flag:
%d")
>> > >     at /usr/src/sys/kern/kern_shutdown.c:818
>> > > #4  0xffffffff80a7c841 in softdep_deallocate_dependencies (
>> > >     bp=0xfffffe01f030e148) at
/usr/src/sys/ufs/ffs/ffs_softdep.c:14099
>> > > #5  0xffffffff807f793f in buf_deallocate
(bp=0xfffffe01f030e148) at
>> > > buf.h:428
>> > > #6  0xffffffff807f59c9 in brelse (bp=0xfffffe01f030e148)
>> > >     at /usr/src/sys/kern/vfs_bio.c:1599
>> > > #7  0xffffffff807f3132 in bufwrite (bp=0xfffffe01f030e148)
>> > >     at /usr/src/sys/kern/vfs_bio.c:1180
>> > > #8  0xffffffff80ab226a in bwrite (bp=0xfffffe01f030e148) at
buf.h:395
>> > > #9  0xffffffff80aafb1b in ffs_write (ap=0xfffffe023ae5d2b8)
>> > >     at /usr/src/sys/ufs/ffs/ffs_vnops.c:800
>> > > #10 0xffffffff80bdf0ed in VOP_WRITE_APV
(vop=0xffffffff80f15480,
>> > >     a=0xfffffe023ae5d2b8) at vnode_if.c:999
>> > > #11 0xffffffff80b1d02e in VOP_WRITE (vp=0xfffff80077e7a000,
>> > >     uio=0xfffffe023ae5d378, ioflag=8323232,
cred=0xfffff80004235000)
>> > >     at vnode_if.h:413
>> > > #12 0xffffffff80b1ce97 in vnode_pager_generic_putpages
>> > > (vp=0xfffff80077e7a000,
>> > >     ma=0xfffffe023ae5d660, bytecount=16384, flags=1,
>> > > rtvals=0xfffffe023ae5d580)
>> > >     at /usr/src/sys/vm/vnode_pager.c:1138
>> > > #13 0xffffffff80805a57 in vop_stdputpages
(ap=0xfffffe023ae5d478)
>> > >     at /usr/src/sys/kern/vfs_default.c:760
>> > > #14 0xffffffff80be201e in VOP_PUTPAGES_APV
(vop=0xffffffff80f00218,
>> > >     a=0xfffffe023ae5d478) at vnode_if.c:2861
>> > > #15 0xffffffff80b1d7e3 in VOP_PUTPAGES
(vp=0xfffff80077e7a000,
>> > >     m=0xfffffe023ae5d660, count=16384, sync=1,
>> rtvals=0xfffffe023ae5d580,
>> > >     offset=0) at vnode_if.h:1189
>> > > #16 0xffffffff80b196f3 in vnode_pager_putpages
>> (object=0xfffff8014a1fce00,
>> > >     m=0xfffffe023ae5d660, count=4, flags=1,
rtvals=0xfffffe023ae5d580)
>> > >     at /usr/src/sys/vm/vnode_pager.c:1016
>> > > #17 0xffffffff80b0a605 in vm_pager_put_pages
>> (object=0xfffff8014a1fce00,
>> > >     m=0xfffffe023ae5d660, count=4, flags=1,
rtvals=0xfffffe023ae5d580)
>> > >     at vm_pager.h:144
>> > > #18 0xffffffff80b0a18c in vm_pageout_flush
(mc=0xfffffe023ae5d660,
>> > > count=4,
>> > >     flags=1, mreq=0, prunlen=0xfffffe023ae5d6f8,
>> eio=0xfffffe023ae5d77c)
>> > >     at /usr/src/sys/vm/vm_pageout.c:533
>> > > #19 0xffffffff80afec76 in vm_object_page_collect_flush (
>> > >     object=0xfffff8014a1fce00, p=0xfffff8023a882370,
pagerflags=1,
>> > > flags=1,
>> > >     clearobjflags=0xfffffe023ae5d780, eio=0xfffffe023ae5d77c)
>> > >     at /usr/src/sys/vm/vm_object.c:971
>> > > #20 0xffffffff80afe91e in vm_object_page_clean
>> (object=0xfffff8014a1fce00,
>> > >     start=0, end=0, flags=1) at
/usr/src/sys/vm/vm_object.c:897
>> > > #21 0xffffffff80afe1fa in vm_object_terminate
>> (object=0xfffff8014a1fce00)
>> > >     at /usr/src/sys/vm/vm_object.c:735
>> > > #22 0xffffffff80b1a0f1 in vnode_destroy_vobject
>> (vp=0xfffff80077e7a000)
>> > >     at /usr/src/sys/vm/vnode_pager.c:164
>> > > #23 0xffffffff80abb191 in ufs_prepare_reclaim
(vp=0xfffff80077e7a000)
>> > >     at /usr/src/sys/ufs/ufs/ufs_inode.c:190
>> > > #24 0xffffffff80abb1f9 in ufs_reclaim (ap=0xfffffe023ae5d968)
>> > >     at /usr/src/sys/ufs/ufs/ufs_inode.c:219
>> > > #25 0xffffffff80be0ade in VOP_RECLAIM_APV
(vop=0xffffffff80f15ec0,
>> > >     a=0xfffffe023ae5d968) at vnode_if.c:2019
>> > > #26 0xffffffff80827849 in VOP_RECLAIM (vp=0xfffff80077e7a000,
>> > >     td=0xfffff80008931960) at vnode_if.h:830
>> > > #27 0xffffffff808219a9 in vgonel (vp=0xfffff80077e7a000)
>> > >     at /usr/src/sys/kern/vfs_subr.c:2943
>> > > #28 0xffffffff808294e8 in vlrureclaim (mp=0xfffff80008b2e000)
>> > >     at /usr/src/sys/kern/vfs_subr.c:882
>> > > #29 0xffffffff80828ea9 in vnlru_proc () at
>> > > /usr/src/sys/kern/vfs_subr.c:1000
>> > > #30 0xffffffff806b66c5 in fork_exit
(callout=0xffffffff80828c50
>> > > <vnlru_proc>,
>> > >     arg=0x0, frame=0xfffffe023ae5dc00) at
>> > > /usr/src/sys/kern/kern_fork.c:1027
>> > > #31 0xffffffff80b21dce in fork_trampoline ()
>> > >     at /usr/src/sys/amd64/amd64/exception.S:611
>> > > #32 0x0000000000000000 in ?? ()
>> > >
>> > > This is a kernel compiled -O -g, its "almost"
GENERIC; the only
>> difference
>> > > is some removed drivers, I have reproduced this on a few
different
>> kernels,
>> > > including a BHYVE one so I can poke at it and not take out
the main
>> > > machine.  The reproduction as it currently stands needs to
have jails
>> > > running, but I don't believe this is a jail interaction,
I think its
>> just
>> > > that the process that sets up the problem happens to be
running in a
>> jail.
>> > > The step is "start jail; run "find /mountpoint
-xdev >/dev/null" on
>> the
>> > > filesystem, when the vnlru forces the problem vnode out the
system
>> panics.
>> > >
>> > > I made a few modifications to the kernel to spit out
information
>> about the
>> > > buf that causes the issue, but that is it.
>> > >
>> > > Information about the buf in question; it has a single
softdependency
>> > > worklist for direct allocation:
>> > > (kgdb) print *bp->b_dep->lh_first
>> > > $6 = {wk_list = {le_next = 0x0, le_prev =
0xfffffe01f030e378},
>> > >   wk_mp = 0xfffff80008b2e000, wk_type = 4, wk_state = 163841}
>> > >
>> > > The file that maps to that buffer:
>> > > ls -lh MOUNTPOINT/jails/mail/var/imap/db/__db.002
>> > > -rw-------  1 cyrus  cyrus    24K Jul  1 20:32
>> > > MOUNTPOINT/jails/mail/var/imap/db/__db.002
>> > >
>> > > Any help is appreciated, until then I will keep banging my
head
>> against
>> > > the proverbial wall on this :)
>> > >
>> > _______________________________________________
>> > freebsd-stable at freebsd.org mailing list
>> > https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> > To unsubscribe, send any mail to "
>> freebsd-stable-unsubscribe at freebsd.org"
>>
>
>

freebsd stable - Jul 2016 - Reproducable panic in FFS with softupdates and no journaling (10.3-RELEASE-pLATEST)

Reproducable panic in FFS with softupdates and no journaling (10.3-RELEASE-pLATEST)

Reproducable panic in FFS with softupdates and no journaling (10.3-RELEASE-pLATEST)