Thanks, Konstantin.
Re: md(4) state:
0 88688 0 0 -8 0 0 16 tx->tx_s DL - 0:45.43
[md0]
Its backtrace:
About the backtrace, indeed, looks like you are right and some portion of
it is not decoded properly, as it's loaded as a kernel module. The setup is
somewhat even more complicated, the /usr/ports is mounted via NULLFS, so in
this command:
cp /usr/local/share/automake-1.15/compile ./compile
The target (i.e. ./compile) here is a path on ZFS that is exported via
NULLFS, while the source is a file on UFS2->md->ZFS. This is probably the
reason stack trace is incomplete, both zfs.ko and nullfs.ko are loaded as
modules and the next few frames point towards those. Unfortunately I cannot
beat kgdb to read symbols from those .ko's and decode them.
#13 0xffffffff80cb36f1 in copyin ()
#14 0xffffffff80977ddf in uiomove_faultflag ()
#15 0xffffffff819f699c in ?? ()
#16 0xfffffe0468a861a0 in ?? ()
#17 0xfffff80000000000 in ?? ()
#18 0xfffffe0468a861a0 in ?? ()
#19 0xfffff80176b39420 in ?? ()
#20 0x0000000000000001 in ?? ()
$ kldstat | grep 0xffffffff819
2 1 0xffffffff819bd000 aef8 nullfs.ko
3 1 0xffffffff819c8000 2fd2f0 zfs.ko
On Wed, Mar 2, 2016 at 1:53 AM, Konstantin Belousov <kostikbel at
gmail.com>
wrote:
> On Wed, Mar 02, 2016 at 01:12:31AM -0800, Maxim Sobolev wrote:
> > Hi, I've encountered cp(1) process stuck in the vnread state on
one of my
> > build machines that got recently upgraded to 10.3.
> >
> > 0 79596 1 0 20 0 17092 1396 wait I 1
> 0:00.00
> > /bin/sh /usr/local/bin/autoreconf -f -i
> > 0 79602 79596 0 52 0 41488 9036 wait I 1
> 0:00.07
> > /usr/local/bin/perl -w /usr/local/bin/autoreconf-2.69 -f -i
> > 0 79639 79602 0 72 0 0 0 - Z 1
> 0:00.27
> > <defunct>
> > 0 79762 79602 0 20 0 17092 1396 wait I 1
> 0:00.00
> > /bin/sh /usr/local/bin/automake --add-missing --copy --force-missing
> > 0 79768 79762 0 52 0 49736 13936 wait I 1
> 0:00.11
> > /usr/local/bin/perl -w /usr/local/bin/automake-1.15 --add-missing
--copy
> > --force-missing
> > 0 79962 79768 0 20 0 12368 1024 vnread DL 1
> 0:00.00
> > cp /usr/local/share/automake-1.15/compile ./compile
> >
> > I am not sure if it's related to that OS version upgrade, but I
have not
> > seen any such issues on the same machine in 2-3 years running
essentially
> > the same build process with version 9.x, 10.0, 10.1 and 10.2.
> >
> > $ uname -a
> > FreeBSD van01.sippysoft.com 10.3-PRERELEASE FreeBSD 10.3-PRERELEASE #1
> > 80de3e2(master)-dirty: Tue Feb 2 12:19:57 PST 2016
> > sobomax at abc.sippysoft.com:
> /usr/obj/usr/home/sobomax/projects/freebsd103/sys/ABC
> > amd64
> >
> > The kernel stack trace is:
> >
> > (kgdb) thread 360
> > [Switching to thread 360 (Thread 100515)]#0 0xffffffff8095244e in
> > sched_switch ()
> > (kgdb) bt
> > #0 0xffffffff8095244e in sched_switch ()
> > #1 0xffffffff809313b1 in mi_switch ()
> > #2 0xffffffff8097089a in sleepq_wait ()
> > #3 0xffffffff80930dd7 in _sleep ()
> > #4 0xffffffff809b230e in bwait ()
> > #5 0xffffffff80b511f3 in vnode_pager_generic_getpages ()
> > #6 0xffffffff80dd1607 in VOP_GETPAGES_APV ()
> > #7 0xffffffff80b4f59a in vnode_pager_getpages ()
> > #8 0xffffffff80b30031 in vm_fault_hold ()
> > #9 0xffffffff80b2f797 in vm_fault ()
> > #10 0xffffffff80cb5a75 in trap_pfault ()
> > #11 0xffffffff80cb51dd in trap ()
> > #12 0xffffffff80c9b122 in calltrap ()
> > #13 0xffffffff80cb36f1 in copyin ()
> > #14 0xffffffff80977ddf in uiomove_faultflag ()
> The backtrace indicates, with 99% certainity that the issue is in the
> requested read never finishing. But the backtrace is obviously not
> complete, and there might be something more happening. At least,
> we do not handle page-ins during uiomove() on user io for quite
> some time.
>
> If the vnode which io hung is UFS over md, you should look at the md
> worker thread state.
>
> >
> > The FS stack configuration is somewhat unique, so I am not sure if I
am
> > hitting some rare race condition or lock ordering issues specific to
> that.
> > It's basically ZFS (ZRAID) on top of pair or SATA SSDs with big
file on
> > that FS attached via md(4) and UFS2 on that md(4). The build itself
runs
> in
> > chroot with that UFS2 fs as its primary root.
> >
> > Just maybe additional bit of info, attempting to list the directory
with
> > that UFS image also got my bash process stuck in "zfs"
state, backtrace
> > from that is:
> A deadlock in the underlying io layer is consistent with this (secondary)
> observation.
>
> >
> > (kgdb) thread 353
> > [Switching to thread 353 (Thread 100508)]#0 0xffffffff8095244e in
> > sched_switch ()
> > (kgdb) bt
> > #0 0xffffffff8095244e in sched_switch ()
> > #1 0xffffffff809313b1 in mi_switch ()
> > #2 0xffffffff8097089a in sleepq_wait ()
> > #3 0xffffffff809069ad in sleeplk ()
> > #4 0xffffffff809060e0 in __lockmgr_args ()
> > #5 0xffffffff809b8b7c in vop_stdlock ()
> > #6 0xffffffff80dd0a3b in VOP_LOCK1_APV ()
> > #7 0xffffffff809d6d23 in _vn_lock ()
> > #8 0xffffffff81a8c9cd in ?? ()
> > #9 0x0000000000000000 in ?? ()
>
>
--
Maksym Sobolyev
Sippy Software, Inc.
Internet Telephony (VoIP) Experts
Tel (Canada): +1-778-783-0474
Tel (Toll-Free): +1-855-747-7779
Fax: +1-866-857-6942
Web: http://www.sippysoft.com
MSN: sales at sippysoft.com
Skype: SippySoft