Tomasz Chmielewski
2012-Jul-12 07:56 UTC
[Gluster-users] NFS mounts with glusterd on localhost - reliable or not?
Hi, are NFS mounts made on a single server (i.e. where glusterd is running) supposed to be stable (with gluster 3.2.6)? I'm using the following line in /etc/fstab: localhost:/sites /var/ftp/sites nfs _netdev,mountproto=tcp,nfsvers=3,bg 0 0 The problem is, after some time (~1-6 hours), I'm no longer able to access this mount. dmesg says: [49609.832274] nfs: server localhost not responding, still trying [49910.639351] nfs: server localhost not responding, still trying [50211.446433] nfs: server localhost not responding, still trying What's worse, whenever this happens, *all* other servers in the cluster (it's a 10-server distributed volume) will destabilise - their load average will grow, and eventually their gluster mount becomes unresponsive, too (other servers use normal gluster mounts). At this point, I have to kill all gluster processes, start glusterd again, mount (on servers using gluster mount). Is it expected behaviour with gluster and NFS mounts on localhost? Can it be caused by some kind of deadlock? Any workarounds? -- Tomasz Chmielewski http://www.ptraveler.com
James Kahn
2012-Jul-13 06:59 UTC
[Gluster-users] NFS mounts with glusterd on localhost - reliable or not?
Try 3.3.0 - 3.2.6 has issues with NFS in general (memory leaks, etc). -----Original Message----- From: Tomasz Chmielewski <mangoo at wpkg.org> Date: Thursday, 12 July 2012 5:56 PM To: Gluster General Discussion List <gluster-users at gluster.org> Subject: [Gluster-users] NFS mounts with glusterd on localhost - reliable or not?>Hi, > >are NFS mounts made on a single server (i.e. where glusterd is running) >supposed to be stable (with gluster 3.2.6)? > > >I'm using the following line in /etc/fstab: > > >localhost:/sites /var/ftp/sites nfs _netdev,mountproto=tcp,nfsvers=3,bg 0 >0 > > >The problem is, after some time (~1-6 hours), I'm no longer able to >access this mount. > >dmesg says: > >[49609.832274] nfs: server localhost not responding, still trying >[49910.639351] nfs: server localhost not responding, still trying >[50211.446433] nfs: server localhost not responding, still trying > > >What's worse, whenever this happens, *all* other servers in the cluster >(it's a 10-server distributed volume) will destabilise - their load >average will grow, and eventually their gluster mount becomes >unresponsive, too (other servers use normal gluster mounts). > >At this point, I have to kill all gluster processes, start glusterd >again, mount (on servers using gluster mount). > > >Is it expected behaviour with gluster and NFS mounts on localhost? Can >it be caused by some kind of deadlock? Any workarounds? > > > >-- >Tomasz Chmielewski >http://www.ptraveler.com >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >
Krishna Srinivas
2012-Jul-19 09:16 UTC
[Gluster-users] NFS mounts with glusterd on localhost - reliable or not?
It was pretty confusing to read this thread. Hope I can clarify the questions here. The original question by Tomasz was whether the behavior seen in https://bugzilla.redhat.com/show_bug.cgi?id=GLUSTER-2320 is still seen in 3.3.0 - yes - the deadlock can not be avoided and still seen when the machine is running low on memory as a write call by gluster-nfs process triggers for an nfs-client cache flush in kernel which in turn tries to write the cached data to an already blocked glusterfs-nfs. Hence avoid this kind of setup. The other discussion in this thread was related to NLM which has been implemented in 3.3.0. This is to support locking calls from the NFS clients to support fcntl() locking for the applications running on nfs client. NLM server is implemented in glusterfs as well as kernel. NLM server implemented in kernel is used by kernel-nfsd as well as kernel-nfs-client. Hence if you have an nfs mount point, the kernel-nfs-client automatically starts kernel NLM server. So if glusterfs-nfs process is already running on a system (and hence it also runs its own NLM server) and if you try to do "mount -t nfs someserver:/export /mnt/nfs" on the same system it fails as kernel-nfs-client won't be able to start kernel-NLM-server (because glusterfs NLM server would have already registered with portmapper for NLM service and hence kernel-NLM-server registration with portmapper fails). Workaround is "mount -t nfs -o nolock someserver:/export /mnt/nfs" if you really want to have an nfs mount on the same machine where glusterfs-nfs process is running.