Ahmed Kamal
2015-Jul-01 22:04 UTC
Linux NFSv4 clients are getting (bad sequence-id error!)
Hi all, *warning*: Sorry I'm cross-posting this from freebsd-fs, things are too quite there unfortunately I'm a refugee from linux land. I just set up my first freebsd 10.1 zfs box, sharing /home over nfs. Since every home directory is its own zfs dataset, I chose to use nfsv4 to enable recursively sharing/mounting any directory under /home (I understand nfs4 is a must in this scenario!) I'm able to mount form linux (rhel5 latest kernel) successfully. Users are working fine. However every now and then a user screams that his session is frozen. Usually the processes are stuck in nfs_wait or rpc_* state. I tried using a much newer linux kernel (3.2 however it still faced the same problem). The errors in Linux log files are mostly: Jul 1 17:41:47 mammoth kernel: NFS: v4 server nas returned a *bad sequence-id error*! Jul 1 17:52:32 mammoth kernel: nfs4_reclaim_locks: unhandled error -11. Zeroing state Jul 1 17:52:32 mammoth kernel: nfs4_reclaim_open_state: Lock reclaim failed! My search led me to (https://access.redhat.com/solutions/1328073) a detailed analysis of the issue, which you can read over here https://dl.dropboxusercontent.com/u/51939288/nfs4-bad-seq.pdf .. NetApp confirmed this was a bug for them (I'm wondering if this is still in FreeBSD?!) PS: Right before sending this, I saw dmesg on the freebsd box advising increasing vfs.nfsd.tcphighwater .. So I up'ed that to 64000. I also up'ed the number of nfs server threads (-t) from 10 to 60 (we're roughly 40 linux machines) Any advice is most appreciated! Thanks