Harry Schmalzbauer
2017-Mar-07 18:44 UTC
unionfs bugs, a partial patch and some comments [Was: Re: 1-BETA3 Panic: __lockmgr_args: downgrade a recursed lockmgr nfs @ /usr/local/share/deploy-tools/RELENG_11/src/sys/fs/unionfs/union_vnops.c:1905]
Bez?glich Harry Schmalzbauer's Nachricht vom 07.03.2017 13:42 (localtime): ?> Something ufs related seems to have tightened the unionfs locking > problem in stable/11. Now the machine instantaniously panics during > boot after mounting root with Rick's latest patch. > > Unfortunately I don't have SWAP available on that machine (yet), but > maybe shit is a hint for anybody. > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xfffffe00982220e0 > vpanic() at vpanic+0x186/frame 0xfffffe0098222160 > kassert_panic() at kassert_panic+0x126/frame 0xfffffe00982221d0 > witness_assert() at witness_assert+0x35a/frame 0xfffffe0098222230 > __lockmgr_args() at __lockmgr_args+0x517/frame 0xfffffe00982222d0 > vop_stdunlock() at vop_stdunlock+0x3b/frame 0xfffffe00982222f0 > VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0xe0/frame 0xfffffe0098222320 > unionfs_unlock() at unionfs_unlock+0x112/frame 0xfffffe0098222390 > VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0xe0/frame 0xfffffe00982223c0 > unionfs_nodeget() at unionfs_nodeget+0x3ef/frame 0xfffffe0098222470 > unionfs_domount() at unionfs_domount+0x518/frame 0xfffffe00982226b0 > vfs_donmount() at vfs_donmount+0xe37/frame 0xfffffe00982228f0 > sys_nmount() at sys_nmount+0x72/frame 0xfffffe0098222930 > amd64_syscall() at amd64_syscall+0x2f9/frame 0xfffffe0098222ab0 > Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0098222ab0 > --- syscall (378, FreeBSD ELF64, sys_nmount), rip = 0x80086ecea, rsp > 0x7fffffffe318, rbp = 0x7fffffffeca0 ---New discovery: Rick's latest patch casues panic only with KDB. If I compile a kernel without witenss and KDB, the machine boots fine! Also, it's at least not so easy anymore to trigger the deadlock :-) . I need to do more testing but until now Rick's approach seems very promising :-) . Unfortunately I can't provide a fix or suggestion to why the KDB kernel panics and the non-KDB doesn't, just the dull imagination it could be that additional locking checks (KASSERT?), preventing more damage, are not in place. So I guess I'm in danger waters, but it defenitly is a highly appreciated improvement for me and my bery best bet for now (neither eliminating unionfs nor holding off 11 updates were real options for me, especially because unionfs isn't really well wokring on 10.3 either, just not leading to deadlocks in more environments)! I tried the non-debug kernel because I browsed old unionfs discussions and desperately gave Attilio Rao's patch a try since I couldn't remember why I haven't kept it locally: https://people.freebsd.org/~attilio/unionfs_nodeget4.patch (he tried to solve unionfs problems for RELENG_9 back in 2012: https://lists.freebsd.org/pipermail/freebsd-stable/2012-November/070358.html) It's still true that his patch leads to a panic with debugging kernel ? only. Same patch without KDB allows to boot and start squid. But the result is the same as with plain r314856, the system deadlocks reproducibly. Also, the trace with his patch looks identical to the plain r314856 unionfs panic. So I hope Rick or someone else can pick up the latest patch and polish it to make KDB-kernels happy :-) I can offer a small donation if that helps! Of course, I'll also provide KDB info if needed/helpful. thanks, -harry
Harry Schmalzbauer
2017-Mar-07 19:45 UTC
unionfs bugs, a partial patch and some comments [Was: Re: 1-BETA3 Panic: __lockmgr_args: downgrade a recursed lockmgr nfs @ /usr/local/share/deploy-tools/RELENG_11/src/sys/fs/unionfs/union_vnops.c:1905]
Bez?glich Harry Schmalzbauer's Nachricht vom 07.03.2017 19:44 (localtime):> Bez?glich Harry Schmalzbauer's Nachricht vom 07.03.2017 13:42 (localtime): > ? >> Something ufs related seems to have tightened the unionfs locking >> problem in stable/11. Now the machine instantaniously panics during >> boot after mounting root with Rick's latest patch. >> >> Unfortunately I don't have SWAP available on that machine (yet), but >> maybe shit is a hint for anybody. >> >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >> 0xfffffe00982220e0 >> vpanic() at vpanic+0x186/frame 0xfffffe0098222160 >> kassert_panic() at kassert_panic+0x126/frame 0xfffffe00982221d0 >> witness_assert() at witness_assert+0x35a/frame 0xfffffe0098222230 >> __lockmgr_args() at __lockmgr_args+0x517/frame 0xfffffe00982222d0 >> vop_stdunlock() at vop_stdunlock+0x3b/frame 0xfffffe00982222f0 >> VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0xe0/frame 0xfffffe0098222320 >> unionfs_unlock() at unionfs_unlock+0x112/frame 0xfffffe0098222390 >> VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0xe0/frame 0xfffffe00982223c0 >> unionfs_nodeget() at unionfs_nodeget+0x3ef/frame 0xfffffe0098222470 >> unionfs_domount() at unionfs_domount+0x518/frame 0xfffffe00982226b0 >> vfs_donmount() at vfs_donmount+0xe37/frame 0xfffffe00982228f0 >> sys_nmount() at sys_nmount+0x72/frame 0xfffffe0098222930 >> amd64_syscall() at amd64_syscall+0x2f9/frame 0xfffffe0098222ab0 >> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0098222ab0 >> --- syscall (378, FreeBSD ELF64, sys_nmount), rip = 0x80086ecea, rsp >> 0x7fffffffe318, rbp = 0x7fffffffeca0 --- > New discovery: > Rick's latest patch casues panic only with KDB. If I compile a kernel > without witenss and KDB, the machine boots fine! > Also, it's at least not so easy anymore to trigger the deadlock :-) . I > need to do more testing but until now Rick's approach seems very > promising :-) .My unionfs deadlock problem isn't really solved with Rick's latest patch, I still can reproduce it: krb5.conf and krb5.keytab are files on unionfs referenced by /etc. libexec/negotiate_kerberos_auth reads these and if I have enough helper processes handling requests, the deadlock occurs. _But_: If I move the files outside the unionfs and create a symlink, I cannot reproduce the deadlock anymore, which was similar easily reproducable without it or any of the other workarounds. So it looks like I have an acceptable solution for now, although it's only usable under certain conditions. Unfortunately I can't do tests with a debug kernel since the patch prevents the system with the debug kernel from starting up. But if this was ironed out, I'd happily provide more info. Thanks, -Harry