[please honour Mail-Followup-To:, no need to keep the crosspost] This is a repost of http://docs.FreeBSD.org/cgi/mid.cgi?20041014110752.GA57541, with some additional information. I've updated the client to RC1, and the problem still persists. In short, a 5.3-RC1 client mounting /home off a 4.10-p3 server can't use the NFS fs anymore when trying to start GNOME, since gconfd and gnome-session are in nfsfsync state. Any process accessing the fs hungs, and the console gets full of nfs server grummit:/fs/home/mount: not responding messages, even though the client can still ping the server and other mount points are still available. AFAICT, nfsd and friends are running both on the client and the server, and the client can use RPC properly (checked via rpcinfo). Also, doing 'tcpdump -vv -s 192 port nfs' on the client and the server seems support the hypothesis of a locking issue, since I see a write request for the same fh repeating over and over. The trace of gnome-session is as follows: db> tr 610 sched_switch(c180b4b0,0,1,11d,27b8ea4) at sched_switch+0x190 mi_switch(1,0,c063d701,19d,2) at mi_switch+0x2ac sleepq_switch(c216d23c,c0639f0f,18e,2,da518a5c) at sleepq_switch+0x134 sleepq_wait(c216d23c,0,c063b2f5,db,0) at sleepq_wait+0x41 msleep(c216d23c,c216d210,4d,c1906703,0) at msleep+0x3b5 nfs_flush(c216d210,c17fed00,1,c180b4b0,0) at nfs_flush+0x961 nfs_close(da518b8c,1,c0643a5e,140,c0681da0) at nfs_close+0x7e vn_close(c216d210,2,c17fed00,c180b4b0,c0692c20) at vn_close+0x67 vn_closefile(c1c2b6e8,c180b4b0,c0637a98,829,c1c2b6e8) at vn_closefile+0xc4 fdrop_locked(c1c2b6e8,c180b4b0,c0637a98,768) at fdrop_locked+0xb4 fdrop(c1c2b6e8,c180b4b0,3,c180b4b0,da518c98) at fdrop+0x3c closef(c1c2b6e8,c180b4b0,c0637a98,3e3,0) at closef+0x21c close(c180b4b0,da518d14,4,431,1) at close+0x135 syscall(2f,2f,2f,0,28d38ec0) at syscall+0x272 Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (6, FreeBSD ELF32, close), eip = 0x28ca1e6f, esp = 0xbfbfe52c, ebp = 0xbfbfe538 --- I have a debugging kernel and a console attached, feel free to ask for any other information of interest. This is driving me nuts, and I'm surely not the only one using GNOME over NFS, is anyone else seeing this? What exactly is going on? How can I fix it? It might be that the problem appeared going from BETA3 to BETA6, but I've been unable to "downgrade" the workstation; where can I get a copy of BETA3 to test this? tks -- pica
* Ken Smith <kensmith@cse.Buffalo.EDU> [20041025 05:49]:> On Mon, Oct 25, 2004 at 02:20:08AM +0200, Joan Picanyol wrote: > > > This is driving me nuts, and I'm surely not the only one using GNOME > > over NFS, is anyone else seeing this? What exactly is going on? How can > > I fix it? It might be that the problem appeared going from BETA3 to > > BETA6, but I've been unable to "downgrade" the workstation; where can I > > get a copy of BETA3 to test this? > > Are you running the lock daemon on the server?Yes, on both client and server, and the client can see it running too: joan@calvin:~(0)$ rpcinfo -s grummit program version(s) netid(s) service owner 100000 2 udp,tcp portmapper unknown 100004 2,1 tcp,udp ypserv unknown 100005 1,3 tcp,udp mountd unknown 100003 3,2 tcp,udp nfs unknown 100021 4,3,1 tcp,udp nlockmgr unknown 100024 1 tcp,udp status unknown joan@calvin:~(0)$ rpcinfo -s calvin program version(s) netid(s) service owner 100000 2,3,4 local,udp,tcp portmapper superuser 100007 2 tcp,udp ypbind superuser 100024 1 tcp,udp status superuser 100021 4,3,1,0 tcp,udp nlockmgr superuser tks -- pica
* Robert Watson <rwatson@FreeBSD.org> [20041025 10:42]:> > On Mon, 25 Oct 2004, Joan Picanyol wrote: > > > [please honour Mail-Followup-To:, no need to keep the crosspost] > > Hmm. Don't see one of those, maybe it was trimmed by Mailman?Just checked, must have been. Maybe because it only had -stable@ on it...> > Also, doing 'tcpdump -vv -s 192 port nfs' on the client and the server > > seems support the hypothesis of a locking issue, since I see a write > > request for the same fh repeating over and over. > > Is there an response to the request? If not, that might suggest the > server is wedged, not the client. If you are willing to share the results > of a tcpdump -s 1500 -w <whatever> output from a few seconds during the > wedge, that would be very useful.Available at http://biaix.org/pk/debug/nfs/ These are from just after logging in to GNOME until gconfd-2 goes to nfsfsync, and the nfs server not responding messages start appearing.> Also useful would be the output of "netstat -na | grep 2049" on the client > and serverNothing special, it's the same before and after the wedge (grummit is the server, calvin the client): calvin# netstat -na | grep 2049 udp4 0 0 192.168.124.9.943 192.168.124.1.2049 udp4 0 0 192.168.124.9.600 192.168.124.1.2049 joan@grummit:~(0)$ netstat -an | grep 2049 tcp4 0 0 *.2049 *.* LISTEN udp4 0 0 *.2049 *.* FYI, calvin is an SMP box, debug.mpsafenet=0 and the NIC is an xl. tks -- pica