Harry Schmalzbauer
2017-Mar-08 08:50 UTC
unionfs bugs, a partial patch and some comments [Was: Re: 1-BETA3 Panic: __lockmgr_args: downgrade a recursed lockmgr nfs @ /usr/local/share/deploy-tools/RELENG_11/src/sys/fs/unionfs/union_vnops.c:1905]
Bez?glich Konstantin Belousov's Nachricht vom 08.03.2017 00:55 (localtime):> On Tue, Mar 07, 2017 at 10:49:01PM +0000, Rick Macklem wrote: >> Hmm, this is going to sound dumb, but I don't recall generating any >> unionfs patch;-) >> I'll go look for it. Maybe it was Kostik's? > I did not touched unionfs, and have no plans to. It is equally broken in > all relevant versions of FreeBSD.ACK. While this is no good news, I have more bad news: deadlock came back? I'd like to summarize in case anybody else is interested in uninionfs, maybe at any time in the future: I observed locking problems back in 2012 and Attilio Rao's final attempt was this: https://people.freebsd.org/~attilio/unionfs_nodeget4.patch I never used it, most likely because it didn't work even back with RELENG_9. It applies to stable/11, but has no effect besides panicing KDB kernels. What I used up to 10.3 was the following simple patch: --- src/sys/fs/unionfs/union_subr.c (revision 231702) +++ src/sys/fs/unionfs/union_subr.c (working copy) @@ -261,7 +261,9 @@ unionfs_nodeget(struct mount *mp, struct vnode *up free(unp, M_UNIONFSNODE); return (error); } + vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); error = insmntque(vp, mp); /* XXX: Too early for mpsafe fs */ + VOP_UNLOCK(vp, 0); if (error != 0) { free(unp, M_UNIONFSNODE); return (error); This hasn't lead to any panic or deadlock during the last 5 years on ~50 machines, up to 10.3. In 2016 I did some tests with 11.0-Beta1, where this thread origins, and Rick kindly looked into it and provided the following patch: https://lists.freebsd.org/pipermail/freebsd-stable/attachments/20160818/d1d1691d/attachment.obj (Explanation: https://lists.freebsd.org/pipermail/freebsd-stable/2016-August/085294.html) This also panics KDB-kernel (and works without KDB) but at least does have influence on the dedalock, in case symlinks are involved, where deadlocks are significantly postponed. ?>>>> >>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >>>> 0xfffffe00982220e0 >>>> vpanic() at vpanic+0x186/frame 0xfffffe0098222160 >>>> kassert_panic() at kassert_panic+0x126/frame 0xfffffe00982221d0 >>>> witness_assert() at witness_assert+0x35a/frame 0xfffffe0098222230 >>>> __lockmgr_args() at __lockmgr_args+0x517/frame 0xfffffe00982222d0 >>>> vop_stdunlock() at vop_stdunlock+0x3b/frame 0xfffffe00982222f0 >>>> VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0xe0/frame 0xfffffe0098222320 >>>> unionfs_unlock() at unionfs_unlock+0x112/frame 0xfffffe0098222390 >>>> VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0xe0/frame 0xfffffe00982223c0 >>>> unionfs_nodeget() at unionfs_nodeget+0x3ef/frame 0xfffffe0098222470 >>>> unionfs_domount() at unionfs_domount+0x518/frame 0xfffffe00982226b0 >>>> vfs_donmount() at vfs_donmount+0xe37/frame 0xfffffe00982228f0 >>>> sys_nmount() at sys_nmount+0x72/frame 0xfffffe0098222930 >>>> amd64_syscall() at amd64_syscall+0x2f9/frame 0xfffffe0098222ab0 >>>> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0098222ab0 >>>> --- syscall (378, FreeBSD ELF64, sys_nmount), rip = 0x80086ecea, rsp >>>> 0x7fffffffe318, rbp = 0x7fffffffeca0 --- >>> New discovery: >>> Rick's latest patch casues panic only with KDB. If I compile a kernel >>> without witenss and KDB, the machine boots fine! >>> Also, it's at least not so easy anymore to trigger the deadlock :-) . I >>> need to do more testing but until now Rick's approach seems very >>> promising :-) . >> >> My unionfs deadlock problem isn't really solved with Rick's latest >> patch, I still can reproduce it: krb5.conf and krb5.keytab are files on >> unionfs referenced by /etc. libexec/negotiate_kerberos_auth reads these >> and if I have enough helper processes handling requests, the deadlock >> occurs. >> >> _But_: If I move the files outside the unionfs and create a symlink, I >> cannot reproduce the deadlock anymore, which was similar easily >> reproducable without it or any of the other workarounds.Picture has changed, the machine daedlocked over night. So it does have a significant influence, but unfortunately isn't the real solution. Thanks for any help, -harry
Rick Macklem
2017-Mar-09 22:49 UTC
unionfs bugs, a partial patch and some comments [Was: Re: 1-BETA3 Panic: __lockmgr_args: downgrade a recursed lockmgr nfs @ /usr/local/share/deploy-tools/RELENG_11/src/sys/fs/unionfs/union_vnops.c:1905]
Konstantin Belousov wrote:> I did not touched unionfs, and have no plans to. It is equally broken in > all relevant versions of FreeBSD.Heh, heh. I chuckled when I read this. I think he's trying to say "it probably won't ever be fixed". My understanding is that it would require a major redesign of the FreeBSD VFS to make it fully stackable to fix unionfs and that isn't happening anytime soon... Harry Schmalzbauer wrote:> In 2016 I did some tests with 11.0-Beta1, where this thread origins, and > Rick kindly looked into it and provided the following patch: > https://lists.freebsd.org/pipermail/freebsd-stable/attachments/20160818/d1d1691d/attachment.obj > (Explanation: > https://lists.freebsd.org/pipermail/freebsd-stable/2016-August/085294.html)Yep, I am guilty of creating this patch. All it tried to do was fix the crash. I don't know why a debug kernel with the patch crashes, but I might try and reproduce that.> Picture has changed, the machine daedlocked over night. So it does have > a significant influence, but unfortunately isn't the real solution.So, do you mean there is no longer any unionfs mount on the machine and it still hangs? If yes, then this should be looked at. rick