Jeremy Chadwick
2010-Mar-29 16:56 UTC
Strange NFS-related messages (related to lockd/statd)
I recently brought up rpc.lockd and rpc.statd on all of our NFS clients (mixed RELENG_6, RELENG_7, and RELENG_8), and our NFS server (RELENG_8). All clients had nfs_client_enable="yes" in rc.conf prior to their last reboot, but lacked rpcbind_enable="yes", rpc_lockd_enable="yes", and rpc_statd_enable="yes" prior to the below. The 8.x clients started rpcbind, rpc.lockd, rpc.statd -- then said: NLM: failed to contact remote rpcbind, stat = 0, port = 0 Can't start NLM - unable to contact NSM The 7.x clients started rpcbind, rpc.lockd, rpc.statd -- then said: Can't start NLM - unable to contact NSM One of the 7.x clients also kernel panic'd when starting rpc.lockd, in some nlm_* kernel functions. Looking at commits showed that the bug that caused the panic was fixed in a later 7.x release. The 7.x clients started rpcbind, rpc.lockd, rpc.statd -- and said nothing. The above daemons were all started in that order, per the FreeBSD Handbook. I can't find a definition of what the acronyms NLM and NSM stand for, nor does Googling the error messages return relevant results (except one FreeBSD committer reporting similar, but nobody replied). I don't know the implications of these messages. The only thing I can think might cause such errors would be the fact that these machines all have dual NICs with firewall rules applied only to their primary (WAN-side) interface. The NFS server exists only on the private (LAN-side) interface. I'm thinking rpcbind may have tried to "do stuff" on the WAN interface, since no -h option was applied. I haven't tried making use of -h yet, nor have I tried restarting the daemons to see if the errors recur (or if it was just a one-time thing). Any information/tips/advice would be appreciated. Danke! -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
On Mon, 29 Mar 2010, Jeremy Chadwick wrote:> > I can't find a definition of what the acronyms NLM and NSM stand for, > nor does Googling the error messages return relevant results (except one > FreeBSD committer reporting similar, but nobody replied). I don't know > the implications of these messages. >NLM - Network Lock Manager NSM - Network Status Monitor (I think?) These two protocols (separate from NFS) were what Sun implemented in the 1980s to provide locking on NFS mount points. Imho, these protocols were poorly designed: - The NLM allows blocking locks at the server, which can cause assorted nasty issues when the client crashes or gets network partitioned. - It also depended on the NSM to decide when machines were up/down and the NSM protocol basically did this in a rather poor way. A big part of NFSv4 was the integration of locking, in order to avoid use of the above. (As you might have guessed, lockd and statd implement the above two protocols. rick
On Mon, 29 Mar 2010, Jeremy Chadwick wrote:> I recently brought up rpc.lockd and rpc.statd on all of our NFS clients > (mixed RELENG_6, RELENG_7, and RELENG_8), and our NFS server (RELENG_8). > > All clients had nfs_client_enable="yes" in rc.conf prior to their last > reboot, but lacked rpcbind_enable="yes", rpc_lockd_enable="yes", and > rpc_statd_enable="yes" prior to the below. > > The 8.x clients started rpcbind, rpc.lockd, rpc.statd -- then said: > > NLM: failed to contact remote rpcbind, stat = 0, port = 0 > Can't start NLM - unable to contact NSM > > The 7.x clients started rpcbind, rpc.lockd, rpc.statd -- then said: > > Can't start NLM - unable to contact NSM >Oh, I forgot to mention..I can't help much, but these protocols/daemons are SunRPC, so they will be using portmapper (now called rpcbind) to get port #s assigned dynamically. I also believe (not sure, don't know much about it) that the NSM will poll for other machines, so it needs to be able to talk to all clients and server(s), including doing IP broadcast that gets to them all. (These were designed in the 1980s for a LAN, which was just a chunk of coax in those days:-) Hope this helps, rick