thr3ads.net - freebsd stable - NFS/ZFS hangs after upgrading from 9.0-RELEASE to -STABLE [Dec 2012]

If this information is useful, please help other people find it:
Share via:

olivier olivier

2012-Dec-03 18:41 UTC

NFS/ZFS hangs after upgrading from 9.0-RELEASE to -STABLE

Hi all
After upgrading from 9.0-RELEASE to 9.1-PRERELEASE #0 r243679 I'm having
severe problems with NFS sharing of a ZFS volume. nfsd appears to hang at
random times (between once every couple hours to once every two days) while
accessing a ZFS volume, and the only way I have found of resolving the
problem is to reboot. The server console is sometimes still responsive
during the nfsd hang, and I can read and write files to the same ZFS volume
while nfsd is hung. I am pasting below the output of procstat -kk on nfsd,
and details of my pool (nfsstat on the server gets hung when the problem
has started occurring, and does not produce any output). The pool is v28
and was created from a bunch of volumes attached over Fibre Channel using
the mpt driver. My system has a Supermicro board and 4 AMD Opteron 6274
CPUs.

I did not experience any nfsd hangs with 9.0-RELEASE (same machine,
essentially same configuration, same usage pattern).

I would greatly appreciate any help to resolve this problem!
Thank you
Olivier

  PID    TID COMM             TDNAME           KSTACK
 1511 102751 nfsd             nfsd: master
mi_switch+0x186
sleepq_wait+0x42
__lockmgr_args+0x5ae
vop_stdlock+0x39
VOP_LOCK1_APV+0x46
_vn_lock+0x47
zfs_fhtovp+0x338
nfsvno_fhtovp+0x87
nfsd_fhtovp+0x7a
nfsrvd_dorpc+0x9cf
nfssvc_program+0x447
svc_run_internal+0x687
svc_run+0x8f
nfsrvd_nfsd+0x193
nfssvc_nfsd+0x9b
sys_nfssvc+0x90
amd64_syscall+0x540
Xfast_syscall+0xf7
 1511 102752 nfsd             nfsd: service
mi_switch+0x186
sleepq_wait+0x42
__lockmgr_args+0x5ae
vop_stdlock+0x39
VOP_LOCK1_APV+0x46
_vn_lock+0x47
zfs_fhtovp+0x338
nfsvno_fhtovp+0x87
nfsd_fhtovp+0x7a
nfsrvd_dorpc+0x9cf
nfssvc_program+0x447
svc_run_internal+0x687
svc_thread_start+0xb
fork_exit+0x11f
fork_trampoline+0xe
 1511 102753 nfsd             nfsd: service
mi_switch+0x186
sleepq_wait+0x42
_cv_wait+0x112
zio_wait+0x61
zil_commit+0x764
zfs_freebsd_write+0xba0
VOP_WRITE_APV+0xb2
nfsvno_write+0x14d
nfsrvd_write+0x362
nfsrvd_dorpc+0x3c0
nfssvc_program+0x447
svc_run_internal+0x687
svc_thread_start+0xb
fork_exit+0x11f
fork_trampoline+0xe
 1511 102754 nfsd             nfsd: service
mi_switch+0x186
sleepq_wait+0x42
_cv_wait+0x112
zio_wait+0x61
zil_commit+0x3cf
zfs_freebsd_fsync+0xdc
nfsvno_fsync+0x2f2
nfsrvd_commit+0xe7
nfsrvd_dorpc+0x3c0
nfssvc_program+0x447
svc_run_internal+0x687
svc_thread_start+0xb
fork_exit+0x11f
fork_trampoline+0xe
 1511 102755 nfsd             nfsd: service
mi_switch+0x186
sleepq_wait+0x42
__lockmgr_args+0x5ae
vop_stdlock+0x39
VOP_LOCK1_APV+0x46
_vn_lock+0x47
zfs_fhtovp+0x338
nfsvno_fhtovp+0x87
nfsd_fhtovp+0x7a
nfsrvd_dorpc+0x9cf
nfssvc_program+0x447
svc_run_internal+0x687
svc_thread_start+0xb
fork_exit+0x11f
fork_trampoline+0xe
 1511 102756 nfsd             nfsd: service
mi_switch+0x186
sleepq_wait+0x42
_cv_wait+0x112
zil_commit+0x6d
zfs_freebsd_write+0xba0
VOP_WRITE_APV+0xb2
nfsvno_write+0x14d
nfsrvd_write+0x362
nfsrvd_dorpc+0x3c0
nfssvc_program+0x447
svc_run_internal+0x687
svc_thread_start+0xb
fork_exit+0x11f
fork_trampoline+0xe


  PID    TID COMM             TDNAME           KSTACK
 1507 102750 nfsd             -
mi_switch+0x186
sleepq_catch_signals+0x2e1
sleepq_wait_sig+0x16
_cv_wait_sig+0x12a
seltdwait+0xf6
kern_select+0x6ef
sys_select+0x5d
amd64_syscall+0x540
Xfast_syscall+0xf7


  pool: tank
 state: ONLINE
status: The pool is formatted using a legacy on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on software that does not support feature
flags.
  scan: scrub repaired 0 in 45h37m with 0 errors on Mon Dec  3 03:07:11 2012
config:

NAME        STATE     READ WRITE CKSUM
tank        ONLINE       0     0     0
  raidz1-0  ONLINE       0     0     0
    da19    ONLINE       0     0     0
    da31    ONLINE       0     0     0
    da32    ONLINE       0     0     0
    da33    ONLINE       0     0     0
    da34    ONLINE       0     0     0
  raidz1-1  ONLINE       0     0     0
    da20    ONLINE       0     0     0
    da36    ONLINE       0     0     0
    da37    ONLINE       0     0     0
    da38    ONLINE       0     0     0
    da39    ONLINE       0     0     0

Rick Macklem

2012-Dec-04 14:26 UTC

head link

NFS/ZFS hangs after upgrading from 9.0-RELEASE to -STABLE

Olivier wrote:> Hi all
> After upgrading from 9.0-RELEASE to 9.1-PRERELEASE #0 r243679 I'm
> having
> severe problems with NFS sharing of a ZFS volume. nfsd appears to hang
> at
> random times (between once every couple hours to once every two days)
> while
> accessing a ZFS volume, and the only way I have found of resolving the
> problem is to reboot. The server console is sometimes still responsive
> during the nfsd hang, and I can read and write files to the same ZFS
> volume
> while nfsd is hung. I am pasting below the output of procstat -kk on
> nfsd,
> and details of my pool (nfsstat on the server gets hung when the
> problem
> has started occurring, and does not produce any output). The pool is
> v28
> and was created from a bunch of volumes attached over Fibre Channel
> using
> the mpt driver. My system has a Supermicro board and 4 AMD Opteron
> 6274
> CPUs.
> 
> I did not experience any nfsd hangs with 9.0-RELEASE (same machine,
> essentially same configuration, same usage pattern).
> 
> I would greatly appreciate any help to resolve this problem!
> Thank you
> Olivier
> 
> PID TID COMM TDNAME KSTACK
> 1511 102751 nfsd nfsd: master
> mi_switch+0x186
> sleepq_wait+0x42
> __lockmgr_args+0x5ae
> vop_stdlock+0x39
> VOP_LOCK1_APV+0x46
> _vn_lock+0x47
> zfs_fhtovp+0x338
> nfsvno_fhtovp+0x87
> nfsd_fhtovp+0x7a
> nfsrvd_dorpc+0x9cf
> nfssvc_program+0x447
> svc_run_internal+0x687
> svc_run+0x8f
> nfsrvd_nfsd+0x193
> nfssvc_nfsd+0x9b
> sys_nfssvc+0x90
> amd64_syscall+0x540
> Xfast_syscall+0xf7
> 1511 102752 nfsd nfsd: service
> mi_switch+0x186
> sleepq_wait+0x42
> __lockmgr_args+0x5ae
> vop_stdlock+0x39
> VOP_LOCK1_APV+0x46
> _vn_lock+0x47
> zfs_fhtovp+0x338
> nfsvno_fhtovp+0x87
> nfsd_fhtovp+0x7a
> nfsrvd_dorpc+0x9cf
> nfssvc_program+0x447
> svc_run_internal+0x687
> svc_thread_start+0xb
> fork_exit+0x11f
> fork_trampoline+0xe
> 1511 102753 nfsd nfsd: service
> mi_switch+0x186
> sleepq_wait+0x42
> _cv_wait+0x112
> zio_wait+0x61
> zil_commit+0x764
> zfs_freebsd_write+0xba0
> VOP_WRITE_APV+0xb2
> nfsvno_write+0x14d
> nfsrvd_write+0x362
> nfsrvd_dorpc+0x3c0
> nfssvc_program+0x447
> svc_run_internal+0x687
> svc_thread_start+0xb
> fork_exit+0x11f
> fork_trampoline+0xe
> 1511 102754 nfsd nfsd: service
> mi_switch+0x186
> sleepq_wait+0x42
> _cv_wait+0x112
> zio_wait+0x61
> zil_commit+0x3cf
> zfs_freebsd_fsync+0xdc
> nfsvno_fsync+0x2f2
> nfsrvd_commit+0xe7
> nfsrvd_dorpc+0x3c0
> nfssvc_program+0x447
> svc_run_internal+0x687
> svc_thread_start+0xb
> fork_exit+0x11f
> fork_trampoline+0xe
> 1511 102755 nfsd nfsd: service
> mi_switch+0x186
> sleepq_wait+0x42
> __lockmgr_args+0x5ae
> vop_stdlock+0x39
> VOP_LOCK1_APV+0x46
> _vn_lock+0x47
> zfs_fhtovp+0x338
> nfsvno_fhtovp+0x87
> nfsd_fhtovp+0x7a
> nfsrvd_dorpc+0x9cf
> nfssvc_program+0x447
> svc_run_internal+0x687
> svc_thread_start+0xb
> fork_exit+0x11f
> fork_trampoline+0xe
> 1511 102756 nfsd nfsd: service
> mi_switch+0x186
> sleepq_wait+0x42
> _cv_wait+0x112
> zil_commit+0x6d
> zfs_freebsd_write+0xba0
> VOP_WRITE_APV+0xb2
> nfsvno_write+0x14d
> nfsrvd_write+0x362
> nfsrvd_dorpc+0x3c0
> nfssvc_program+0x447
> svc_run_internal+0x687
> svc_thread_start+0xb
> fork_exit+0x11f
> fork_trampoline+0xe
> These threads are either waiting for a vnode lock or waiting inside
zil_commit() { at 3 different locations in zil_commit() }. A guess
would be that the ZIL hasn`t completed a write for some reason, so
3 threads are waiting for it when one of them is holding a lock on
the vnode being written and the remaining threads are waiting for
that vnode lock.

I am not a ZFS guy, so I cannot help further, except to suggest
that you try and determine what might cause a write to the ZIL to
stall. (Different device, different device driver...)

Good luck with it, rick
> 
> PID TID COMM TDNAME KSTACK
> 1507 102750 nfsd -
> mi_switch+0x186
> sleepq_catch_signals+0x2e1
> sleepq_wait_sig+0x16
> _cv_wait_sig+0x12a
> seltdwait+0xf6
> kern_select+0x6ef
> sys_select+0x5d
> amd64_syscall+0x540
> Xfast_syscall+0xf7
> 
> 
> pool: tank
> state: ONLINE
> status: The pool is formatted using a legacy on-disk format. The pool
> can
> still be used, but some features are unavailable.
> action: Upgrade the pool using 'zpool upgrade'. Once this is done,
the
> pool will no longer be accessible on software that does not support
> feature
> flags.
> scan: scrub repaired 0 in 45h37m with 0 errors on Mon Dec 3 03:07:11
> 2012
> config:
> 
> NAME STATE READ WRITE CKSUM
> tank ONLINE 0 0 0
> raidz1-0 ONLINE 0 0 0
> da19 ONLINE 0 0 0
> da31 ONLINE 0 0 0
> da32 ONLINE 0 0 0
> da33 ONLINE 0 0 0
> da34 ONLINE 0 0 0
> raidz1-1 ONLINE 0 0 0
> da20 ONLINE 0 0 0
> da36 ONLINE 0 0 0
> da37 ONLINE 0 0 0
> da38 ONLINE 0 0 0
> da39 ONLINE 0 0 0
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
> "freebsd-stable-unsubscribe at freebsd.org"

Andriy Gapon

2012-Dec-13 10:36 UTC

head link

NFS/ZFS hangs after upgrading from 9.0-RELEASE to -STABLE

I decided to share here the comment that I made in private, so that more people
could potentially benefit from it.

on 03/12/2012 20:41 olivier olivier said the following:> Hi all
> After upgrading from 9.0-RELEASE to 9.1-PRERELEASE #0 r243679 I'm
having
> severe problems with NFS sharing of a ZFS volume. nfsd appears to hang at
> random times (between once every couple hours to once every two days) while
> accessing a ZFS volume, and the only way I have found of resolving the
> problem is to reboot. The server console is sometimes still responsive
> during the nfsd hang, and I can read and write files to the same ZFS volume
> while nfsd is hung. I am pasting below the output of procstat -kk on nfsd,
> and details of my pool (nfsstat on the server gets hung when the problem
> has started occurring, and does not produce any output). The pool is v28
> and was created from a bunch of volumes attached over Fibre Channel using
> the mpt driver. My system has a Supermicro board and 4 AMD Opteron 6274
> CPUs.
> 
> I did not experience any nfsd hangs with 9.0-RELEASE (same machine,
> essentially same configuration, same usage pattern).
> 
> I would greatly appreciate any help to resolve this problem!

I've looked at the provided data and I do not see anything that implicates
ZFS.
My rules of the thumb for ZFS hangs:
- if there are threads in zio_wait
- if you can firm that they are indeed stuck there[*]
- if there are no threads in zio_interrupt

[*] you have to be sure that a thread just sits in zio_wait and doesn't make
any
forward progress as opposed to the thread doing a lot of I/O and thus having a
high probability of being seen in zio_wait.

Then it is most likely that the problem is at the storage level.
Most likely it is a bug in storage controller driver which allowed an I/O
request
to get lost (instead of "errored out" or timed out).

`camcontrol tags <disk> -v` can be used to query depth of a queue for each
disk
and determine the bad one.

-- 
Andriy Gapon

freebsd stable - Dec 2012 - NFS/ZFS hangs after upgrading from 9.0-RELEASE to -STABLE

NFS/ZFS hangs after upgrading from 9.0-RELEASE to -STABLE

NFS/ZFS hangs after upgrading from 9.0-RELEASE to -STABLE

NFS/ZFS hangs after upgrading from 9.0-RELEASE to -STABLE