On 10/11/2016 19:55, Henri Hennebert wrote:>
>
> On 11/10/2016 18:33, Andriy Gapon wrote:
>> On 10/11/2016 18:12, Henri Hennebert wrote:
>>> On 11/10/2016 16:54, Andriy Gapon wrote:
>>>> On 10/11/2016 17:20, Henri Hennebert wrote:
>>>>> On 11/10/2016 15:00, Andriy Gapon wrote:
>>>>>> Interesting. I can not spot any suspicious thread that
would hold the vnode
>>>>>> lock. Could you please run kgdb (just like that, no
arguments), then execute
>>>>>> 'bt' command and then select a frame when
_vn_lock is called with 'fr N'
>>>>>> command. Then please 'print *vp' and share the
result.
>>>>>>
>>>>> I Think I miss something in your request:
>>>>
>>>> Oh, sorry! The very first step should be 'tid 101112'
to switch to the correct
>>>> context.
>>>>
>>>
>>> (kgdb) fr 7
>>> #7 0xffffffff8063c5b3 in _vn_lock (vp=<value optimized out>,
flags=2121728,
>>
>> "value optimized out" - not good
>>
>>> file=<value optimized out>,
>>> line=<value optimized out>) at vnode_if.h:859
>>> 859 vnode_if.h: No such file or directory.
>>> in vnode_if.h
>>> (kgdb) print *vp
>>
>> I am not sure if this output is valid, because of the message above.
>> Could you please try to navigate to nearby frames and see if vp itself
has a
>> valid value there. If you can find such a frame please do *vp there.
>>
>
> Does this seems better?
Yes!
> (kgdb) fr 8
> #8 0xffffffff8062a5f7 in vget (vp=0xfffff80049c2c000, flags=2121728,
> td=0xfffff80009ba0500) at /usr/src/sys/kern/vfs_subr.c:2523
> 2523 if ((error = vn_lock(vp, flags)) != 0) {
> (kgdb) print *vp
> $1 = {v_tag = 0xffffffff813be535 "zfs", v_op =
0xffffffff813d0f70, v_data > 0xfffff80049c1f420, v_mount =
0xfffff800093aa660,
> v_nmntvnodes = {tqe_next = 0xfffff80049c2c938, tqe_prev =
0xfffff80049c2bb30},
> v_un = {vu_mount = 0x0, vu_socket = 0x0,
> vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev
> 0x0}, v_cache_src = {lh_first = 0x0}, v_cache_dst = {
> tqh_first = 0xfffff800bfc8e3f0, tqh_last = 0xfffff800bfc8e410},
v_cache_dd > 0x0, v_lock = {lock_object = {
> lo_name = 0xffffffff813be535 "zfs", lo_flags = 117112832,
lo_data = 0,
> lo_witness = 0x0}, lk_lock = 23, lk_exslpfail = 0,
> lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {lo_name >
0xffffffff8099e9e0 "vnode interlock", lo_flags = 16973824,
> lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, v_vnlock >
0xfffff80049c2c068, v_actfreelist = {
> tqe_next = 0xfffff80049c2c938, tqe_prev = 0xfffff80049ae9bd0}, v_bufobj
> {bo_lock = {lock_object = {
> lo_name = 0xffffffff8099e9f0 "bufobj interlock", lo_flags
= 86179840,
> lo_data = 0, lo_witness = 0x0}, rw_lock = 1},
> bo_ops = 0xffffffff80c4bf70, bo_object = 0xfffff800b62e9c60,
bo_synclist > {le_next = 0x0, le_prev = 0x0},
> bo_private = 0xfffff80049c2c000, __bo_vnode = 0xfffff80049c2c000,
bo_clean > {bv_hd = {tqh_first = 0x0,
> tqh_last = 0xfffff80049c2c120}, bv_root = {pt_root = 0}, bv_cnt =
0},
> bo_dirty = {bv_hd = {tqh_first = 0x0,
> tqh_last = 0xfffff80049c2c140}, bv_root = {pt_root = 0}, bv_cnt =
0},
> bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072},
> v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters >
{tqh_first = 0x0, tqh_last = 0xfffff80049c2c188},
> rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0,
> v_holdcnt = 9, v_usecount = 6, v_iflag = 512,
> v_vflag = 32, v_writecount = 0, v_hash = 4833984, v_type = VREG}
> (kgdb)
flags=2121728 = 0x206000 = LK_SHARED | LK_VNHELD | LK_NODDLKTREAT
lk_lock = 23 = 0x17 = LK_ONE_SHARER | LK_EXCLUSIVE_WAITERS | LK_SHARED_WAITERS |
LK_SHARE
So, here's what we have here: this thread tries to get a shared lock on the
vnode, the vnode is already locked in shared mode, but there is an exclusive
waiter (or, perhaps, multiple waiters). So, this thread can not get the lock
because of the exclusive waiter. And I do not see an easy way to identify that
waiter.
In the procstat output that you provided earlier there was no other thread in
vn_lock. Hmm, I see this:
procstat: sysctl: kern.proc.kstack: 14789: Device busy
procstat: sysctl: kern.proc.kstack: 82034: Device busy
Could you please check what those two processes are (if they are still running)?
Perhaps try procstat for each of the pids several times.
--
Andriy Gapon