thr3ads.net - freebsd stable - unionfs bugs, a partial patch and some comments [Was: Re: 1-BETA3 Panic: __lockmgr_args: downgrade a recursed lockmgr nfs @ /usr/local/share/deploy-tools/RELENG_11/src/sys/fs/unionfs/union

If this information is useful, please help other people find it:
Share via:

Rick Macklem

2016-Sep-05 21:21 UTC

unionfs bugs, a partial patch and some comments [Was: Re: 1-BETA3 Panic: __lockmgr_args: downgrade a recursed lockmgr nfs @ /usr/local/share/deploy-tools/RELENG_11/src/sys/fs/unionfs/union_vnops.c:1905]

Harry Schmalzbauer <freebsd at omnilan.de> wrote:>Bez?glich Rick Macklem's Nachricht vom 18.08.2016 02:03 (localtime):
>>  Kostik wrote:
>> [stuff snipped]
>>> insmnque() performs the cleanup on its own, and that default
cleanup isnot suitable >for the situation.  I think that insmntque1() would
betterfit your requirements, your >need to move the common code into a
helper.It seems that >unionfs_ins_cached_vnode() cleanup could reuse it.
>> <https://lists.freebsd.org>
>> I've attached an updated patch (untested like the last one). This
one creates a
>> custom version insmntque_stddtr() that first calls unionfs_noderem()
and then
>> does the same stuff as insmntque_stddtr(). This looks like it does the
required
>> stuff (unionfs_noderem() is what the unionfs VOP_RECLAIM() does).
>> It switches the node back to using its own v_vnlock that is exclusively
locked,
>> among other things.
>
>Thanks a lot, today I gave it a try.
>
>With this patch, one reproducable panic can still be easily triggered:
>        I have directory A unionfs_mounted under directory B.
>Then I mount_unionfs the same directory A below another directory C.
>panic: __lockmgr_args: downgrade a recursed lockmgr nfs @
>/usr/local/share/deploy-tools/RELENG_11/src/sys/fs/unionfs/union_vnops.c:1905
>Result is this backtrace, hardly helpful I guess:
>
>#1  0xffffffff80ae5fd9 in kern_reboot (howto=260) at
>/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:366
>#2  0xffffffff80ae658b in vpanic (fmt=<value optimized out>,
ap=<value
>optimized out>)
>    at
>/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:759
>#3  0xffffffff80ae63c3 in panic (fmt=0x0) at
>/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:690
>#4  0xffffffff80ab7ab7 in __lockmgr_args (lk=<value optimized out>,
>flags=<value optimized out>, ilk=<value optimized out>,
wmesg=<value
>optimized out>,
>    pri=<value optimized out>, timo=<value optimized out>,
file=<value
>optimized out>, line=<value optimized out>)
>  >   at
/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_lock.c:992
>#5  0xffffffff80ba510c in vop_stdlock (ap=<value optimized out>) at
>lockmgr.h:98
>#6  0xffffffff8111932d in VOP_LOCK1_APV (vop=<value optimized out>,
>a=<value optimized out>) at vnode_if.c:2087
>#7  0xffffffff80a18cfc in unionfs_lock (ap=0xfffffe007a3ba6a0) at
>vnode_if.h:859
>#8  0xffffffff8111932d in VOP_LOCK1_APV (vop=<value optimized out>,
>a=<value optimized out>) at vnode_if.c:2087
>#9  0xffffffff80bc9b93 in _vn_lock (vp=<value optimized out>,
>flags=66560, file=<value optimized out>, line=<value optimized
out>) at
>vnode_if.h:859
>#10 0xffffffff80a18460 in unionfs_readdir (ap=<value optimized out>)
at
>/usr/local/share/deploy-tools/RELENG_11/src/sys/fs/unionfs/union_vnops.c:1531
>#11 0xffffffff81118ecf in VOP_READDIR_APV (vop=<value optimized out>,
>a=<value optimized out>) at vnode_if.c:1822
>#12 0xffffffff80bc6e3b in kern_getdirentries (td=<value optimized
out>,
>fd=<value optimized out>, buf=0x800c3d000 <Address 0x800c3d000 out
of
>bounds>,
>    count=<value optimized out>, basep=0xfffffe007a3ba980, residp=0x0)
>at vnode_if.h:758
>#13 0xffffffff80bc6bf8 in sys_getdirentries (td=0x0,
>uap=0xfffffe007a3baa40) at
>/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_syscalls.c:3940
>#14 0xffffffff80fad6b8 in amd64_syscall (td=<value optimized out>,
>traced=0) at subr_syscall.c:135
>#15 0xffffffff80f8feab in Xfast_syscall () at
>/usr/local/share/deploy-tools/RELENG_11/src/sys/amd64/amd64/exception.S:396
>#16 0x0000000000452eea in ?? ()
>Previous frame inner to this frame (corrupt stack?Ok, I finally got around to looking at this and the panic() looks like a pretty
straightforward
bug in the unionfs code.
- In unionfs_readdir(), it does a vn_lock(..LK_UPGRADE) and then later in the
code
  vn_lock(..LK_DOWNGRADE) if it did the upgrade. (At line#1531 as noted in the
backtrace.)
  - In unionfs_lock(), it sets LK_CANRECURSE when it is the rootvp and
LK_EXCLUSIVE.
   (So it allows recursive acquisition in this case.)
--> Then it would call vn_lock(..LK_DOWNGRADE), which would panic if it has
recursed.

Now, I'll admit unionfs_lock() is too obscure for me to understand, but...
Is it necessary to vn_lock(..LK_DOWNGRADE) or can unionfs_readdir() just return
with the vnode exclusively locked?
(It would be easy to change the code to avoid the vn_lock(..LK_DOWNGRADE) call
 when it has done the vn_lock(..LK_EXCLUSIVE) after vn_lock(..LK_UPGRADE)
fails.)

rick
>I ran your previous patch with for some time.
>Similarly, mounting one directory below a 2nd mountpount crashed the
>machine (forgot to config dumpdir, so can't compare backtrace with the
>current patch).
>Otherwise, at least with the previous patch, I haven't had any other
>panic for about one week.
>
>Thanks,
>
>-Harry

Harry Schmalzbauer

2017-Mar-07 12:42 UTC

head link

unionfs bugs, a partial patch and some comments [Was: Re: 1-BETA3 Panic: __lockmgr_args: downgrade a recursed lockmgr nfs @ /usr/local/share/deploy-tools/RELENG_11/src/sys/fs/unionfs/union_vnops.c:1905]

Bez?glich Rick Macklem's Nachricht vom 05.09.2016 23:21
(localtime):> Harry Schmalzbauer <freebsd at omnilan.de> wrote:
>>Bez?glich Rick Macklem's Nachricht vom 18.08.2016 02:03 (localtime):
>>>  Kostik wrote:
>>> [stuff snipped]
>>>> insmnque() performs the cleanup on its own, and that default
cleanup
> isnot suitable >for the situation.  I think that insmntque1() would
> betterfit your requirements, your >need to move the common code into a
> helper.It seems that >unionfs_ins_cached_vnode() cleanup could reuse it.
>>> <https://lists.freebsd.org>
>>> I've attached an updated patch (untested like the last one).
This one
> creates a
>>> custom version insmntque_stddtr() that first calls
unionfs_noderem()
> and then
>>> does the same stuff as insmntque_stddtr(). This looks like it does
the
> required
>>> stuff (unionfs_noderem() is what the unionfs VOP_RECLAIM() does).
>>> It switches the node back to using its own v_vnlock that is
> exclusively locked,
>>> among other things.
>>
>>Thanks a lot, today I gave it a try.
>>
>>With this patch, one reproducable panic can still be easily triggered:
>>        I have directory A unionfs_mounted under directory B.
>>Then I mount_unionfs the same directory A below another directory C.
>>panic: __lockmgr_args: downgrade a recursed lockmgr nfs @
>>/usr/local/share/deploy-tools/RELENG_11/src/sys/fs/unionfs/union_vnops.c:1905
>>Result is this backtrace, hardly helpful I guess:
>>
>>#1  0xffffffff80ae5fd9 in kern_reboot (howto=260) at
>>/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:366
>>#2  0xffffffff80ae658b in vpanic (fmt=<value optimized out>,
ap=<value
>>optimized out>)
>>    at
>>/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:759
>>#3  0xffffffff80ae63c3 in panic (fmt=0x0) at
>>/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:690
>>#4  0xffffffff80ab7ab7 in __lockmgr_args (lk=<value optimized
out>,
>>flags=<value optimized out>, ilk=<value optimized out>,
wmesg=<value
>>optimized out>,
>>    pri=<value optimized out>, timo=<value optimized out>,
file=<value
>>optimized out>, line=<value optimized out>)
>>  >   at
> /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_lock.c:992
>>#5  0xffffffff80ba510c in vop_stdlock (ap=<value optimized out>)
at
>>lockmgr.h:98
>>#6  0xffffffff8111932d in VOP_LOCK1_APV (vop=<value optimized
out>,
>>a=<value optimized out>) at vnode_if.c:2087
>>#7  0xffffffff80a18cfc in unionfs_lock (ap=0xfffffe007a3ba6a0) at
>>vnode_if.h:859
>>#8  0xffffffff8111932d in VOP_LOCK1_APV (vop=<value optimized
out>,
>>a=<value optimized out>) at vnode_if.c:2087
>>#9  0xffffffff80bc9b93 in _vn_lock (vp=<value optimized out>,
>>flags=66560, file=<value optimized out>, line=<value optimized
out>) at
>>vnode_if.h:859
>>#10 0xffffffff80a18460 in unionfs_readdir (ap=<value optimized
out>) at
>>/usr/local/share/deploy-tools/RELENG_11/src/sys/fs/unionfs/union_vnops.c:1531
>>#11 0xffffffff81118ecf in VOP_READDIR_APV (vop=<value optimized
out>,
>>a=<value optimized out>) at vnode_if.c:1822
>>#12 0xffffffff80bc6e3b in kern_getdirentries (td=<value optimized
out>,
>>fd=<value optimized out>, buf=0x800c3d000 <Address 0x800c3d000
out of
>>bounds>,
>>    count=<value optimized out>, basep=0xfffffe007a3ba980,
residp=0x0)
>>at vnode_if.h:758
>>#13 0xffffffff80bc6bf8 in sys_getdirentries (td=0x0,
>>uap=0xfffffe007a3baa40) at
>>/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/vfs_syscalls.c:3940
>>#14 0xffffffff80fad6b8 in amd64_syscall (td=<value optimized out>,
>>traced=0) at subr_syscall.c:135
>>#15 0xffffffff80f8feab in Xfast_syscall () at
>>/usr/local/share/deploy-tools/RELENG_11/src/sys/amd64/amd64/exception.S:396
>>#16 0x0000000000452eea in ?? ()
>>Previous frame inner to this frame (corrupt stack?
> Ok, I finally got around to looking at this and the panic() looks like a
> pretty straightforward
> bug in the unionfs code.
> - In unionfs_readdir(), it does a vn_lock(..LK_UPGRADE) and then later
> in the code
>   vn_lock(..LK_DOWNGRADE) if it did the upgrade. (At line#1531 as noted
> in the backtrace.)
>   - In unionfs_lock(), it sets LK_CANRECURSE when it is the rootvp and
> LK_EXCLUSIVE.
>    (So it allows recursive acquisition in this case.)
> --> Then it would call vn_lock(..LK_DOWNGRADE), which would panic if it
> has recursed.
> 
> Now, I'll admit unionfs_lock() is too obscure for me to understand,
but...
> Is it necessary to vn_lock(..LK_DOWNGRADE) or can unionfs_readdir() just
> return
> with the vnode exclusively locked?
> (It would be easy to change the code to avoid the
> vn_lock(..LK_DOWNGRADE) call
>  when it has done the vn_lock(..LK_EXCLUSIVE) after
> vn_lock(..LK_UPGRADE) fails.)
> 
> rick
> 
>>I ran your previous patch with for some time.
>>Similarly, mounting one directory below a 2nd mountpount crashed the
>>machine (forgot to config dumpdir, so can't compare backtrace with
the
>>current patch).
>>Otherwise, at least with the previous patch, I haven't had any other
>>panic for about one week.
Something ufs related seems to have tightened the unionfs locking
problem in stable/11.  Now the machine instantaniously panics during
boot after mounting root with Rick's latest patch.

Unfortunately I don't have SWAP available on that machine (yet), but
maybe shit is a hint for anybody.

db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
0xfffffe00982220e0
vpanic() at vpanic+0x186/frame 0xfffffe0098222160
kassert_panic() at kassert_panic+0x126/frame 0xfffffe00982221d0
witness_assert() at witness_assert+0x35a/frame 0xfffffe0098222230
__lockmgr_args() at __lockmgr_args+0x517/frame 0xfffffe00982222d0
vop_stdunlock() at vop_stdunlock+0x3b/frame 0xfffffe00982222f0
VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0xe0/frame 0xfffffe0098222320
unionfs_unlock() at unionfs_unlock+0x112/frame 0xfffffe0098222390
VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0xe0/frame 0xfffffe00982223c0
unionfs_nodeget() at unionfs_nodeget+0x3ef/frame 0xfffffe0098222470
unionfs_domount() at unionfs_domount+0x518/frame 0xfffffe00982226b0
vfs_donmount() at vfs_donmount+0xe37/frame 0xfffffe00982228f0
sys_nmount() at sys_nmount+0x72/frame 0xfffffe0098222930
amd64_syscall() at amd64_syscall+0x2f9/frame 0xfffffe0098222ab0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0098222ab0
--- syscall (378, FreeBSD ELF64, sys_nmount), rip = 0x80086ecea, rsp
0x7fffffffe318, rbp = 0x7fffffffeca0 ---

-harry

freebsd stable - Mar 2017 - unionfs bugs, a partial patch and some comments [Was: Re: 1-BETA3 Panic: __lockmgr_args: downgrade a recursed lockmgr nfs @ /usr/local/share/deploy-tools/RELENG_11/src/sys/fs/unionfs/union_vnops.c:1905]

unionfs bugs, a partial patch and some comments [Was: Re: 1-BETA3 Panic: __lockmgr_args: downgrade a recursed lockmgr nfs @ /usr/local/share/deploy-tools/RELENG_11/src/sys/fs/unionfs/union_vnops.c:1905]

unionfs bugs, a partial patch and some comments [Was: Re: 1-BETA3 Panic: __lockmgr_args: downgrade a recursed lockmgr nfs @ /usr/local/share/deploy-tools/RELENG_11/src/sys/fs/unionfs/union_vnops.c:1905]