Greetings! I'm having a problem with nfsd hanging and not serving mount points, during which time it can not not be killed. This problem started happening sometime after November 2nd, since kernel from 11/2 sources does not exhibit this problem. The current kernel I'm running is via SVN I just grabbed this evening (around 5pm PDT on November 4th), but I was having the same problem yesterday around 9pm PDT after a csup yesterday (I switched to SVN today to rule out a stale /usr/src from an out of sync cvsup mirror). Here are the svn details: Path: /usr/src URL: svn://svn.freebsd.org/base/stable/8 Repository Root: svn://svn.freebsd.org/base Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f Revision: 214807 Node Kind: directory Schedule: normal Last Changed Author: jhb Last Changed Rev: 214791 Last Changed Date: 2010-11-04 10:25:31 -0700 (Thu, 04 Nov 2010) uname -a: FreeBSD 8.1-STABLE FreeBSD 8.1-STABLE #0 r214807: Thu Nov 4 17:13:05 PDT 2010 root@pflog.net:/usr/obj/usr/src/sys/PFLOG amd64 I have a Popcorn Hour, and as soon as I try to connect to my NFS mount with it, it hangs on the Popcorn Hour, then eventually pops up a message that says "Request cannot be processed". Likewise if I try to mount it from my macbook, it hangs then later just says operation timed out or something like that, after it hangs for quite a while. During this hang, there is nothing in /var/log indicating a problem nor any other indications something is wrong, except that none of my NFS mounts work and the nfsd process will not die. When I try to reboot the server, I wind up having to fsck all my drives (except the ZFS one), since nfsd will not die. Even kill -9 doesn't kill it (it's showing as in the D state): root 444 0.0 0.0 5812 1384 ?? D 9:30PM 0:00.00 nfsd: server (nfsd) And if I try to /etc/rc.d/nfsd stop, it just says: Stopping nfsd. Waiting for PIDS: 444 And hangs there indefinitely. I tried to run a ktrace on both the "nfsd: server" and "nfsd: master" processes (ktrace -i -d -f nfsd_server.ktrace and ktrace -i -d -f nfsd_master.ktrace), but when I try to connect to the NFS mount, ktrace doesn't capture anything and the "nfsd: server" process goes to the "D" state and then I can't kill it. If I try to kill the nfsd process BEFORE I attempt to mount anything, it properly stops with /etc/rc.d/nfsd stop or with a kill -TERM. Once I've tried to connect once, however, it can't be killed. Hoping it was perhaps related to ZFS, I commented out the one ZFS mount point in /etc/exports, but it still causes this deadlock in the nfsd process. I even went as far as to comment everything in /etc/exports and create a new export on a different disk, which did not help, I get the same nfsd hang. Another strange thing, if I try to truss on the "nfsd: server" process (the child) before I try to mount anything, it causes the process to exit immediately along with truss. If I look at what truss captured for it, I see: 411: sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0) 411: sigprocmask(SIG_SETMASK,0x0,0x0) = 0 (0x0) 411: process exit, rval = 0 My kernel built from sources on 11/2 works fine, so it's something that has changed sometime after November 2nd. At least, my kernel from November 2nd runs fine and does not have this nfsd lockup problem. My kernel is just GENERIC with a few additions: include GENERIC device pf device pflog device coretemp device uchcom device sound device snd_hda option NETATALK option ALTQ option ALTQ_CBQ option ALTQ_HFSC option ALTQ_NOPCC option ALTQ_PRIQ option ALTQ_RED option ALTQ_RIO option COMPAT_LINUX32 option GEOM_MIRROR option LIBICONV option LIBMCHAIN option NETSMB option NULLFS option SMBFS option UDF nooption INET6 If any other information is needed, please let me know. What are the next things I should be doing to diagnose the problem? It seems specific to nfsd, but I'm not sure how to prove it's that and not something related or complimentary to nfsd. For what it's worth rpcbind and mountd both stop fine, it's just the nfsd process that is locking up. Thanks in advance for any advice on troubleshooting or root-causing the issue would be appreciated. Regards, Josh
on 05/11/2010 07:35 Josh Carroll said the following:> Greetings! > > I'm having a problem with nfsd hanging and not serving mount points, > during which time it can not not be killed. This problem started > happening sometime after November 2nd, since kernel from 11/2 sources > does not exhibit this problem. > > The current kernel I'm running is via SVN I just grabbed this evening > (around 5pm PDT on November 4th), but I was having the same problem > yesterday around 9pm PDT after a csup yesterday (I switched to SVN > today to rule out a stale /usr/src from an out of sync cvsup mirror). > Here are the svn details: > > Path: /usr/src > URL: svn://svn.freebsd.org/base/stable/8 > Repository Root: svn://svn.freebsd.org/base > Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f > Revision: 214807 > Node Kind: directory > Schedule: normal > Last Changed Author: jhb > Last Changed Rev: 214791 > Last Changed Date: 2010-11-04 10:25:31 -0700 (Thu, 04 Nov 2010) > > uname -a: > > FreeBSD 8.1-STABLE FreeBSD 8.1-STABLE #0 r214807: Thu Nov 4 17:13:05 > PDT 2010 root@pflog.net:/usr/obj/usr/src/sys/PFLOG amd64 > > I have a Popcorn Hour, and as soon as I try to connect to my NFS mount > with it, it hangs on the Popcorn Hour, then eventually pops up a > message that says "Request cannot be processed". Likewise if I try to > mount it from my macbook, it hangs then later just says operation > timed out or something like that, after it hangs for quite a while. > > During this hang, there is nothing in /var/log indicating a problem > nor any other indications something is wrong, except that none of my > NFS mounts work and the nfsd process will not die. > > When I try to reboot the server, I wind up having to fsck all my > drives (except the ZFS one), since nfsd will not die. Even kill -9 > doesn't kill it (it's showing as in the D state): > > root 444 0.0 0.0 5812 1384 ?? D 9:30PM 0:00.00 nfsd: server (nfsd)You can try 'procstat -kk <pid>' next time this happens. -- Andriy Gapon
> Greetings! > > I'm having a problem with nfsd hanging and not serving mount points, > during which time it can not not be killed. This problem started > happening sometime after November 2nd, since kernel from 11/2 sources > does not exhibit this problem.Please try the attached patch, rick ps: Starting about Monday I won't be able to do commits for about 3 weeks so, if this patch works, could someone else please commit it, thanks, rick -------------- next part -------------- A non-text attachment was scrubbed... Name: nfs_serv.patch Type: text/x-patch Size: 385 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20101105/e1dabbc3/nfs_serv.bin
Richard A Steenbergen
2010-Nov-05 22:33 UTC
NFS deadlock (unkillable nfsd and no mounts work)
On Thu, Nov 04, 2010 at 10:35:15PM -0700, Josh Carroll wrote:> Greetings! > > I'm having a problem with nfsd hanging and not serving mount points, > during which time it can not not be killed. This problem started > happening sometime after November 2nd, since kernel from 11/2 sources > does not exhibit this problem.I had a similar issue on -current a few weeks ago, with processes that would lock up and become unkillable when they tried to access certain parts of the filesystem (running all zfs here). One time it managed to lock up every time you'd do an ls /, but a reboot would always clear it, then a few days later it would pop up again somewhere else. I never lost any data, zfs never found anything wrong, and the drives and hw all checked out. I sync'd up with the latest -current on oct 18th and it stopped happening (or maybe I just stopped noticing it, entirely possible for a very lightly used personal box), plus I was traveling and super busy at the time, so I didn't bother pursuing it further. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)