Peter Auyeung
2015-Jan-26 00:26 UTC
[Gluster-users] [Gluster-devel] lockd: server not responding, timed out
Hi Niels, The question if we keep getting the lockd error even after restart and rebooted the NFS client.. Peter ________________________________________ From: Niels de Vos [ndevos at redhat.com] Sent: Saturday, January 24, 2015 3:26 AM To: Peter Auyeung Cc: gluster-users at gluster.org; gluster-devel at gluster.org Subject: Re: [Gluster-devel] [Gluster-users] lockd: server not responding, timed out On Fri, Jan 23, 2015 at 11:50:26PM +0000, Peter Auyeung wrote:> We have a 6 nodes gluster running ubuntu on xfs sharing gluster > volumes over NFS been running fine for 3 months. > We restarted glusterfs-server on one of the node and all NFS clients > start getting the " lockd: server not responding, timed out" on > /var/log/messages > > We are still able to read write but seems like process that require a > persistent file lock failed like database exports. > > We have an interim fix to remount the NFS with nolock option but need > to know why that is necessary all in a sudden after a service > glusterfs-server restart on one of the gluster nodeThe cause that you need to mount wiht 'nolock' is that one server can only have one NLM-service active. The Linux NFS-client uses the 'lockd' kernel module, and the Gluster/NFS server provides its own lock manager. To be able to use a lock manager, it needs to be registered at rpcbind/portmapper. Only one lock manager can be registered at a time, the 2nd one that tries to register will fail. In case the NFS-client has registered the lockd kernel module as lock manager, any locking requests to the Gluster/NFS service will fail and you will see those messages in /var/log/messages. This is one of the main reasons why it is not advised to access volumes over NFS on a Gluster storage server. You should rather use the GlusterFS protocol for mounting volumes locally. (Or even better, seperate your storage servers from the application servers.) HTH, Niels
Niels de Vos
2015-Jan-26 12:37 UTC
[Gluster-users] [Gluster-devel] lockd: server not responding, timed out
On Mon, Jan 26, 2015 at 12:26:53AM +0000, Peter Auyeung wrote:> Hi Niels, > > The question if we keep getting the lockd error even after restart and > rebooted the NFS client..This particular error would only occur when the NFS-server could not register the nlockmgr RPC-program to rpcbind/portmapper. The most likely scenario where this fails, is where there is an NFS-client (or service) on the storage server that conflicts with the Gluster/NFS service. If there are conflicting RPC services in rpcbind/portmapper, you may be able check and remove those with the 'rpcinfo' command. Ports that are listed in te output, but are not listed in netstat/ss are in used by kernel services (like the lockd kernel module). In order to restore the NLM function of Gluster/NFS, you can take these steps: 1. ensure that there are no other NFS-services (server or client) running on the Gluster storage server. Gluster/NFS should be the only service which does some NFS on the server. 2. stop the rpcbind service 3. clear the rpcbind-cache (rm /var/lib/rpcbind/portmap.xdr) 4. start the rpcbind service 5. restart the Gluster/NFS service In case your NFS-client got connected to the incorrect NLM service on your storage server, you would need to unmount and mount the export again. Niels> > Peter > ________________________________________ > From: Niels de Vos [ndevos at redhat.com] > Sent: Saturday, January 24, 2015 3:26 AM > To: Peter Auyeung > Cc: gluster-users at gluster.org; gluster-devel at gluster.org > Subject: Re: [Gluster-devel] [Gluster-users] lockd: server not responding, timed out > > On Fri, Jan 23, 2015 at 11:50:26PM +0000, Peter Auyeung wrote: > > We have a 6 nodes gluster running ubuntu on xfs sharing gluster > > volumes over NFS been running fine for 3 months. > > We restarted glusterfs-server on one of the node and all NFS clients > > start getting the " lockd: server not responding, timed out" on > > /var/log/messages > > > > We are still able to read write but seems like process that require a > > persistent file lock failed like database exports. > > > > We have an interim fix to remount the NFS with nolock option but need > > to know why that is necessary all in a sudden after a service > > glusterfs-server restart on one of the gluster node > > The cause that you need to mount wiht 'nolock' is that one server can > only have one NLM-service active. The Linux NFS-client uses the 'lockd' > kernel module, and the Gluster/NFS server provides its own lock manager. > To be able to use a lock manager, it needs to be registered at > rpcbind/portmapper. Only one lock manager can be registered at a time, > the 2nd one that tries to register will fail. In case the NFS-client has > registered the lockd kernel module as lock manager, any locking requests > to the Gluster/NFS service will fail and you will see those messages in > /var/log/messages. > > This is one of the main reasons why it is not advised to access volumes > over NFS on a Gluster storage server. You should rather use the > GlusterFS protocol for mounting volumes locally. (Or even better, > seperate your storage servers from the application servers.) > > HTH, > Niels-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150126/0f77f068/attachment.sig>