thr3ads.net - freebsd stable - Process stuck in "vnread" [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Konstantin Belousov

2016-Mar-02 09:53 UTC

Process stuck in "vnread"

On Wed, Mar 02, 2016 at 01:12:31AM -0800, Maxim Sobolev
wrote:> Hi, I've encountered cp(1) process stuck in the vnread state on one of
my
> build machines that got recently upgraded to 10.3.
> 
>    0 79596     1   0  20  0   17092    1396 wait     I     1       0:00.00
> /bin/sh /usr/local/bin/autoreconf -f -i
>    0 79602 79596   0  52  0   41488    9036 wait     I     1       0:00.07
> /usr/local/bin/perl -w /usr/local/bin/autoreconf-2.69 -f -i
>    0 79639 79602   0  72  0       0       0 -        Z     1       0:00.27
> <defunct>
>    0 79762 79602   0  20  0   17092    1396 wait     I     1       0:00.00
> /bin/sh /usr/local/bin/automake --add-missing --copy --force-missing
>    0 79768 79762   0  52  0   49736   13936 wait     I     1       0:00.11
> /usr/local/bin/perl -w /usr/local/bin/automake-1.15 --add-missing --copy
> --force-missing
>    0 79962 79768   0  20  0   12368    1024 vnread   DL    1       0:00.00
> cp /usr/local/share/automake-1.15/compile ./compile
> 
> I am not sure if it's related to that OS version upgrade, but I have
not
> seen any such issues on the same machine in 2-3 years running essentially
> the same build process with version 9.x, 10.0, 10.1 and 10.2.
> 
> $ uname -a
> FreeBSD van01.sippysoft.com 10.3-PRERELEASE FreeBSD 10.3-PRERELEASE #1
> 80de3e2(master)-dirty: Tue Feb  2 12:19:57 PST 2016
> sobomax at
abc.sippysoft.com:/usr/obj/usr/home/sobomax/projects/freebsd103/sys/ABC
>  amd64
> 
> The kernel stack trace is:
> 
> (kgdb) thread 360
> [Switching to thread 360 (Thread 100515)]#0  0xffffffff8095244e in
> sched_switch ()
> (kgdb) bt
> #0  0xffffffff8095244e in sched_switch ()
> #1  0xffffffff809313b1 in mi_switch ()
> #2  0xffffffff8097089a in sleepq_wait ()
> #3  0xffffffff80930dd7 in _sleep ()
> #4  0xffffffff809b230e in bwait ()
> #5  0xffffffff80b511f3 in vnode_pager_generic_getpages ()
> #6  0xffffffff80dd1607 in VOP_GETPAGES_APV ()
> #7  0xffffffff80b4f59a in vnode_pager_getpages ()
> #8  0xffffffff80b30031 in vm_fault_hold ()
> #9  0xffffffff80b2f797 in vm_fault ()
> #10 0xffffffff80cb5a75 in trap_pfault ()
> #11 0xffffffff80cb51dd in trap ()
> #12 0xffffffff80c9b122 in calltrap ()
> #13 0xffffffff80cb36f1 in copyin ()
> #14 0xffffffff80977ddf in uiomove_faultflag ()The backtrace indicates, with 99% certainity that the issue is in the
requested read never finishing.  But the backtrace is obviously not
complete, and there might be something more happening.  At least,
we do not handle page-ins during uiomove() on user io for quite
some time.

If the vnode which io hung is UFS over md, you should look at the md
worker thread state.
> 
> The FS stack configuration is somewhat unique, so I am not sure if I am
> hitting some rare race condition or lock ordering issues specific to that.
> It's basically ZFS (ZRAID) on top of pair or SATA SSDs with big file on
> that FS attached via md(4) and UFS2 on that md(4). The build itself runs in
> chroot with that UFS2 fs as its primary root.
> 
> Just maybe additional bit of info, attempting to list the directory with
> that UFS image also got my bash process stuck in "zfs" state,
backtrace
> from that is:A deadlock in the underlying io layer is consistent with this (secondary)
observation.
> 
> (kgdb) thread 353
> [Switching to thread 353 (Thread 100508)]#0  0xffffffff8095244e in
> sched_switch ()
> (kgdb) bt
> #0  0xffffffff8095244e in sched_switch ()
> #1  0xffffffff809313b1 in mi_switch ()
> #2  0xffffffff8097089a in sleepq_wait ()
> #3  0xffffffff809069ad in sleeplk ()
> #4  0xffffffff809060e0 in __lockmgr_args ()
> #5  0xffffffff809b8b7c in vop_stdlock ()
> #6  0xffffffff80dd0a3b in VOP_LOCK1_APV ()
> #7  0xffffffff809d6d23 in _vn_lock ()
> #8  0xffffffff81a8c9cd in ?? ()
> #9  0x0000000000000000 in ?? ()

Maxim Sobolev

2016-Mar-02 11:02 UTC

head link

Process stuck in "vnread"

Thanks, Konstantin.

Re: md(4) state:

   0 88688     0   0  -8  0       0      16 tx->tx_s DL    -       0:45.43
[md0]

Its backtrace:


About the backtrace, indeed, looks like you are right and some portion of
it is not decoded properly, as it's loaded as a kernel module. The setup is
somewhat even more complicated, the /usr/ports is mounted via NULLFS, so in
this command:

cp /usr/local/share/automake-1.15/compile ./compile

The target (i.e. ./compile) here is a path on ZFS that is exported via
NULLFS, while the source is a file on UFS2->md->ZFS. This is probably the
reason stack trace is incomplete, both zfs.ko and nullfs.ko are loaded as
modules and the next few frames point towards those. Unfortunately I cannot
beat kgdb to read symbols from those .ko's and decode them.

#13 0xffffffff80cb36f1 in copyin ()
#14 0xffffffff80977ddf in uiomove_faultflag ()
#15 0xffffffff819f699c in ?? ()
#16 0xfffffe0468a861a0 in ?? ()
#17 0xfffff80000000000 in ?? ()
#18 0xfffffe0468a861a0 in ?? ()
#19 0xfffff80176b39420 in ?? ()
#20 0x0000000000000001 in ?? ()

$ kldstat | grep 0xffffffff819
 2    1 0xffffffff819bd000 aef8     nullfs.ko
 3    1 0xffffffff819c8000 2fd2f0   zfs.ko




On Wed, Mar 2, 2016 at 1:53 AM, Konstantin Belousov <kostikbel at
gmail.com>
wrote:
> On Wed, Mar 02, 2016 at 01:12:31AM -0800, Maxim Sobolev wrote:
> > Hi, I've encountered cp(1) process stuck in the vnread state on
one of my
> > build machines that got recently upgraded to 10.3.
> >
> >    0 79596     1   0  20  0   17092    1396 wait     I     1
>  0:00.00
> > /bin/sh /usr/local/bin/autoreconf -f -i
> >    0 79602 79596   0  52  0   41488    9036 wait     I     1
>  0:00.07
> > /usr/local/bin/perl -w /usr/local/bin/autoreconf-2.69 -f -i
> >    0 79639 79602   0  72  0       0       0 -        Z     1
>  0:00.27
> > <defunct>
> >    0 79762 79602   0  20  0   17092    1396 wait     I     1
>  0:00.00
> > /bin/sh /usr/local/bin/automake --add-missing --copy --force-missing
> >    0 79768 79762   0  52  0   49736   13936 wait     I     1
>  0:00.11
> > /usr/local/bin/perl -w /usr/local/bin/automake-1.15 --add-missing
--copy
> > --force-missing
> >    0 79962 79768   0  20  0   12368    1024 vnread   DL    1
>  0:00.00
> > cp /usr/local/share/automake-1.15/compile ./compile
> >
> > I am not sure if it's related to that OS version upgrade, but I
have not
> > seen any such issues on the same machine in 2-3 years running
essentially
> > the same build process with version 9.x, 10.0, 10.1 and 10.2.
> >
> > $ uname -a
> > FreeBSD van01.sippysoft.com 10.3-PRERELEASE FreeBSD 10.3-PRERELEASE #1
> > 80de3e2(master)-dirty: Tue Feb  2 12:19:57 PST 2016
> > sobomax at abc.sippysoft.com:
> /usr/obj/usr/home/sobomax/projects/freebsd103/sys/ABC
> >  amd64
> >
> > The kernel stack trace is:
> >
> > (kgdb) thread 360
> > [Switching to thread 360 (Thread 100515)]#0  0xffffffff8095244e in
> > sched_switch ()
> > (kgdb) bt
> > #0  0xffffffff8095244e in sched_switch ()
> > #1  0xffffffff809313b1 in mi_switch ()
> > #2  0xffffffff8097089a in sleepq_wait ()
> > #3  0xffffffff80930dd7 in _sleep ()
> > #4  0xffffffff809b230e in bwait ()
> > #5  0xffffffff80b511f3 in vnode_pager_generic_getpages ()
> > #6  0xffffffff80dd1607 in VOP_GETPAGES_APV ()
> > #7  0xffffffff80b4f59a in vnode_pager_getpages ()
> > #8  0xffffffff80b30031 in vm_fault_hold ()
> > #9  0xffffffff80b2f797 in vm_fault ()
> > #10 0xffffffff80cb5a75 in trap_pfault ()
> > #11 0xffffffff80cb51dd in trap ()
> > #12 0xffffffff80c9b122 in calltrap ()
> > #13 0xffffffff80cb36f1 in copyin ()
> > #14 0xffffffff80977ddf in uiomove_faultflag ()
> The backtrace indicates, with 99% certainity that the issue is in the
> requested read never finishing.  But the backtrace is obviously not
> complete, and there might be something more happening.  At least,
> we do not handle page-ins during uiomove() on user io for quite
> some time.
>
> If the vnode which io hung is UFS over md, you should look at the md
> worker thread state.
>
> >
> > The FS stack configuration is somewhat unique, so I am not sure if I
am
> > hitting some rare race condition or lock ordering issues specific to
> that.
> > It's basically ZFS (ZRAID) on top of pair or SATA SSDs with big
file on
> > that FS attached via md(4) and UFS2 on that md(4). The build itself
runs
> in
> > chroot with that UFS2 fs as its primary root.
> >
> > Just maybe additional bit of info, attempting to list the directory
with
> > that UFS image also got my bash process stuck in "zfs"
state, backtrace
> > from that is:
> A deadlock in the underlying io layer is consistent with this (secondary)
> observation.
>
> >
> > (kgdb) thread 353
> > [Switching to thread 353 (Thread 100508)]#0  0xffffffff8095244e in
> > sched_switch ()
> > (kgdb) bt
> > #0  0xffffffff8095244e in sched_switch ()
> > #1  0xffffffff809313b1 in mi_switch ()
> > #2  0xffffffff8097089a in sleepq_wait ()
> > #3  0xffffffff809069ad in sleeplk ()
> > #4  0xffffffff809060e0 in __lockmgr_args ()
> > #5  0xffffffff809b8b7c in vop_stdlock ()
> > #6  0xffffffff80dd0a3b in VOP_LOCK1_APV ()
> > #7  0xffffffff809d6d23 in _vn_lock ()
> > #8  0xffffffff81a8c9cd in ?? ()
> > #9  0x0000000000000000 in ?? ()
>
>

-- 
Maksym Sobolyev
Sippy Software, Inc.
Internet Telephony (VoIP) Experts
Tel (Canada): +1-778-783-0474
Tel (Toll-Free): +1-855-747-7779
Fax: +1-866-857-6942
Web: http://www.sippysoft.com
MSN: sales at sippysoft.com
Skype: SippySoft

freebsd stable - Mar 2016 - Process stuck in "vnread"

Process stuck in "vnread"

Process stuck in "vnread"