Harald Schmalzbauer
2014-Jul-24 15:42 UTC
panic/lock on 9.3-RELEASE with nullfs/nfs/zfs combination
Hello, I'm running 9.3-amd64 with some zfilesystems and a jail. One zfilesystem is nullfs_mounted into jail. Now I can export (nfsv4) that nullfs_mounted filesystem and rw-opening a file inside the jail from the nullfs_mounted fs works, until a client walks into nfs_mounted filesystem (just listing directory contents e.g.). So mount shows like this: tank/my/fs15 mounted on /zfs/netshares/fs15 (zfs, NFS exported, local, noatime, noexec, nosuid, nfsv4acls) /zfs/netshares/fs15 on /.JAIL/usr/ports (nullfs, local) When I the try to open a file (rw) inside the jail from the nullfs_mounted filesystem, 9.3-RELEASE blocks any IO completely on that filesystem (local or remote), with debug-kernel I get the following panic on the nfs/jail server: panic: LK_RETRY set with incompatible flags (0x200400) or an error occured (11) cpuid = 3 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffffff82e54bcc70 kdb_backtrace() at kdb_backtrace+0x37/frame 0xffffff82e54bcd30 panic() at panic+0x1cd/frame 0xffffff82e54bce30 _vn_lock() at _vn_lock+0x67/frame 0xffffff82e54bce90 zfs_lookup() at zfs_lookup+0x420/frame 0xffffff82e54bcf20 zfs_freebsd_lookup() at zfs_freebsd_lookup+0xa6/frame 0xffffff82e54bd070 VOP_CACHEDLOOKUP_APV() at VOP_CACHEDLOOKUP_APV+0xd8/frame 0xffffff82e54bd0a0 vfs_cache_lookup() at vfs_cache_lookup+0xff/frame 0xffffff82e54bd110 VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0xd8/frame 0xffffff82e54bd140 null_lookup() at null_lookup+0x92/frame 0xffffff82e54bd1c0 VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0xd8/frame 0xffffff82e54bd1f0 lookup() at lookup+0x389/frame 0xffffff82e54bd290 namei() at namei+0x3df/frame 0xffffff82e54bd340 vn_open_cred() at vn_open_cred+0x1e2/frame 0xffffff82e54bd4b0 vop_stdvptocnp() at vop_stdvptocnp+0x1af/frame 0xffffff82e54bd7e0 null_vptocnp() at null_vptocnp+0xf5/frame 0xffffff82e54bd850 VOP_VPTOCNP_APV() at VOP_VPTOCNP_APV+0xdb/frame 0xffffff82e54bd880 vn_vptocnp_locked() at vn_vptocnp_locked+0x15b/frame 0xffffff82e54bd910 vn_fullpath1() at vn_fullpath1+0x100/frame 0xffffff82e54bd970 kern___getcwd() at kern___getcwd+0xd4/frame 0xffffff82e54bd9d0 amd64_syscall() at amd64_syscall+0x318/frame 0xffffff82e54bdaf0 Xfast_syscall() at Xfast_syscall+0xf7/frame 0xffffff82e54bdaf0 --- syscall (326, FreeBSD ELF64, sys___getcwd), rip = 0x8011a191c, rsp 0x7fffffffe658, rbp = 0x801873400 --- KDB: enter: panic [ thread pid 1905 tid 100856 ] Stopped at kdb_enter+0x3b: movq $0,0x642172(%rip) Like mentioned, this panic happens only if a nfs(v4) client visits fs15 (the exported and nullfs_mounted fs) and I try to rw-open any file on the nullfs afterwards!!! How can I provide useful info with KDB? I don't have a dumpdev available in that machine? http://www.es.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html seems not applicaple, no /var/crash/?*? Thanks for help, -Harry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 196 bytes Desc: OpenPGP digital signature URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20140724/fecc26bb/attachment.sig>
Konstantin Belousov
2014-Jul-24 16:59 UTC
panic/lock on 9.3-RELEASE with nullfs/nfs/zfs combination
On Thu, Jul 24, 2014 at 05:42:43PM +0200, Harald Schmalzbauer wrote:> Hello, > > I'm running 9.3-amd64 with some zfilesystems and a jail. > > One zfilesystem is nullfs_mounted into jail. > > Now I can export (nfsv4) that nullfs_mounted filesystem and rw-opening a > file inside the jail from the nullfs_mounted fs works, until a client > walks into nfs_mounted filesystem (just listing directory contents e.g.). > So mount shows like this: > > tank/my/fs15 mounted on /zfs/netshares/fs15 (zfs, NFS exported, local, > noatime, noexec, nosuid, nfsv4acls) > /zfs/netshares/fs15 on /.JAIL/usr/ports (nullfs, local) > > > When I the try to open a file (rw) inside the jail from the > nullfs_mounted filesystem, 9.3-RELEASE blocks any IO completely on that > filesystem (local or remote), > with debug-kernel I get the following panic on the nfs/jail server: > > panic: LK_RETRY set with incompatible flags (0x200400) or an error > occured (11) > cpuid = 3 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame > 0xffffff82e54bcc70 > kdb_backtrace() at kdb_backtrace+0x37/frame 0xffffff82e54bcd30 > panic() at panic+0x1cd/frame 0xffffff82e54bce30 > _vn_lock() at _vn_lock+0x67/frame 0xffffff82e54bce90 > zfs_lookup() at zfs_lookup+0x420/frame 0xffffff82e54bcf20 > zfs_freebsd_lookup() at zfs_freebsd_lookup+0xa6/frame 0xffffff82e54bd070 > VOP_CACHEDLOOKUP_APV() at VOP_CACHEDLOOKUP_APV+0xd8/frame 0xffffff82e54bd0a0 > vfs_cache_lookup() at vfs_cache_lookup+0xff/frame 0xffffff82e54bd110 > VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0xd8/frame 0xffffff82e54bd140 > null_lookup() at null_lookup+0x92/frame 0xffffff82e54bd1c0 > VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0xd8/frame 0xffffff82e54bd1f0 > lookup() at lookup+0x389/frame 0xffffff82e54bd290 > namei() at namei+0x3df/frame 0xffffff82e54bd340 > vn_open_cred() at vn_open_cred+0x1e2/frame 0xffffff82e54bd4b0 > vop_stdvptocnp() at vop_stdvptocnp+0x1af/frame 0xffffff82e54bd7e0 > null_vptocnp() at null_vptocnp+0xf5/frame 0xffffff82e54bd850 > VOP_VPTOCNP_APV() at VOP_VPTOCNP_APV+0xdb/frame 0xffffff82e54bd880 > vn_vptocnp_locked() at vn_vptocnp_locked+0x15b/frame 0xffffff82e54bd910 > vn_fullpath1() at vn_fullpath1+0x100/frame 0xffffff82e54bd970 > kern___getcwd() at kern___getcwd+0xd4/frame 0xffffff82e54bd9d0 > amd64_syscall() at amd64_syscall+0x318/frame 0xffffff82e54bdaf0 > Xfast_syscall() at Xfast_syscall+0xf7/frame 0xffffff82e54bdaf0 > --- syscall (326, FreeBSD ELF64, sys___getcwd), rip = 0x8011a191c, rsp > 0x7fffffffe658, rbp = 0x801873400 --- > KDB: enter: panic > [ thread pid 1905 tid 100856 ] > Stopped at kdb_enter+0x3b: movq $0,0x642172(%rip) > > Like mentioned, this panic happens only if a nfs(v4) client visits fs15 > (the exported and nullfs_mounted fs) and I try to rw-open any file on > the nullfs afterwards!!! > > How can I provide useful info with KDB? I don't have a dumpdev available > in that machine??? > http://www.es.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html > seems not applicaple, no /var/crash/?*??? >The lockmgr flags are LK_SHARE | LK_RETRY, and error 11 == EDEADLK indicates that the lock is already taken by the curthread in the exclusive mode. I am interested in what line of code did the locking. Add ddb, INVARIANTS, WITNESS and DEBUG_VFS_LOCKS options to the kernel config, reproduce the issue and, after the panic occured and you get at the ddb prompt, issue command 'show alllocks'. Also, do 'show mount', after which do 'show mount <addr>', where <addr> is the address of your nullfs mount point, printed by 'show mount'. I need all console output starting from the panic message. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: not available URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20140724/ea233258/attachment.sig>