Hi, since we switched to NFS(due to many small files) we are experiencing heavy problems with Glusters NFS daemon. About once a day, the Gluster NFS process just crashes on one of the machines and doesn't come up again until I issue a restart of the Gluster daemon on that node. Sometimes the crashed node will even crash again after the restart. We have a ~2TB volume with 6 bricks on 5 servers, accessed by 12 NFS clients and one FUSE client. In the nfs logs there's something like the following: tail -n 100 /var/log/glusterfs/nfs.log frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) [...] frame : type(0) op(0) signal received: 11 time of crash: 2013-08-15 14:08:39 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.4.0 /lib/x86_64-linux-gnu/libc.so.6(+0x364c0)[0x7fac361904c0] /lib/x86_64-linux-gnu/libpthread.so.0(pthread_spin_lock+0x0)[0x7fac36523a50] /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(fd_unref+0x36)[0x7fac36b96966] /usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/xlator/protocol/client.so(client_local_wipe+0x28)[0x7fac31f6a4f8] /usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/xlator/protocol/client.so(client3_3_opendir_cbk+0x19c)[0x7fac31f8353c] /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x7fac36957bd5] /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xc5)[0x7fac36957f35] /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x27)[0x7fac36954627] /usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/rpc-transport/socket.so(+0xa1d1)[0x7fac32e091d1] /usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/rpc-transport/socket.so(+0xa81c)[0x7fac32e0981c] /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x5e553)[0x7fac36bbd553] /usr/sbin/glusterfs(main+0x3e3)[0x7fac37007883] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7fac3617b76d] /usr/sbin/glusterfs(+0x5c79)[0x7fac37007c79] --------- Is there anything we could do to prevent this or at least something to find the cause of this? At the moment we have the ugly workaround to check the NFS status via cron and restart the server if necessary but that's nothing we find suitable for larger deployments..
This is already addressed in BZ 959190. Thanks, Santosh On 09/24/2013 08:24 PM, Vijay Bellur wrote:> Hi Santosh, Rajesh - > > Can you please respond to this thread? > > - Vijay > > > -------- Original Message -------- > Subject: [Gluster-users] NFS crashes on Gluster 3.4.0 > Date: Tue, 24 Sep 2013 14:20:02 +0200 > From: Maik Kulbe <info at linux-web-development.de> > Reply-To: Maik Kulbe <info at linux-web-development.de> > To: gluster-users at gluster.org > > Hi, > > since we switched to NFS(due to many small files) we are experiencing > heavy problems with Glusters NFS daemon. About once a day, the Gluster > NFS process just crashes on one of the machines and doesn't come up > again until I issue a restart of the Gluster daemon on that node. > Sometimes the crashed node will even crash again after the restart. > > We have a ~2TB volume with 6 bricks on 5 servers, accessed by 12 NFS > clients and one FUSE client. > > In the nfs logs there's something like the following: > > tail -n 100 /var/log/glusterfs/nfs.log > frame : type(0) op(0) > frame : type(0) op(0) > frame : type(0) op(0) > [...] > frame : type(0) op(0) > > signal received: 11 > time of crash: 2013-08-15 14:08:39 > configuration details: > argp 1 > backtrace 1 > dlfcn 1 > fdatasync 1 > libpthread 1 > llistxattr 1 > setfsid 1 > spinlock 1 > epoll.h 1 > xattr.h 1 > st_atim.tv_nsec 1 > package-string: glusterfs 3.4.0 > /lib/x86_64-linux-gnu/libc.so.6(+0x364c0)[0x7fac361904c0] > /lib/x86_64-linux-gnu/libpthread.so.0(pthread_spin_lock+0x0)[0x7fac36523a50] > > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(fd_unref+0x36)[0x7fac36b96966] > > /usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/xlator/protocol/client.so(client_local_wipe+0x28)[0x7fac31f6a4f8] > > /usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/xlator/protocol/client.so(client3_3_opendir_cbk+0x19c)[0x7fac31f8353c] > > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x7fac36957bd5] > > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xc5)[0x7fac36957f35] > > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x27)[0x7fac36954627] > > /usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/rpc-transport/socket.so(+0xa1d1)[0x7fac32e091d1] > > /usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/rpc-transport/socket.so(+0xa81c)[0x7fac32e0981c] > > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x5e553)[0x7fac36bbd553] > /usr/sbin/glusterfs(main+0x3e3)[0x7fac37007883] > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7fac3617b76d] > /usr/sbin/glusterfs(+0x5c79)[0x7fac37007c79] > --------- > > > > > Is there anything we could do to prevent this or at least something to > find the cause of this? At the moment we have the ugly workaround to > check the NFS status via cron and restart the server if necessary but > that's nothing we find suitable for larger deployments.. > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users > > > >
On 09/24/2013 09:31 PM, Santosh Pradhan wrote:> This is already addressed in BZ 959190. >Fix doesn't seem to be present in release-3.4. Can you please send out a backport to release-3.4? Thanks, Vijay
On Tue, 24 Sep 2013 14:20:02 +0200 "Maik Kulbe" <info at linux-web-development.de> wrote: <snip>> Is there anything we could do to prevent this or at least something to find the cause of this? At the moment we have the ugly workaround to check the NFS status via cron and restart the server if necessary but that's nothing we find suitable for larger deployments..As mentioned by Vijay and Santosh, the fix for this has been added to the Git repository, and will be in the next 3.4.x release of Gluster. Which OS are you using? If it's something rpm based (RHEL/CentOS/Fedora), building your own RPMs from git source is very easy: http://www.gluster.org/community/documentation/index.php/CompilingRPMS You would build from the "release-3.4" branch, with the resulting RPMs being drop in replacements for the existing ones. Does that help? Regards and best wishes, Justin Clift -- Justin Clift <jclift at redhat.com>
-----Original Mail----- From: Justin Clift [jclift at redhat.com] Sent: 25.09.13 - 15:16:54 To: Maik Kulbe [info at linux-web-development.de] Cc: gluster-users at gluster.org Subject: Re: [Gluster-users] NFS crashes on Gluster 3.4.0> On Tue, 24 Sep 2013 14:20:02 +0200 > "Maik Kulbe" wrote: > > > Is there anything we could do to prevent this or at least something to > find the cause of this? At the moment we have the ugly workaround to > check the NFS status via cron and restart the server if necessary but > that's nothing we find suitable for larger deployments.. > > As mentioned by Vijay and Santosh, the fix for this has > been added to the Git repository, and will be in the next > 3.4.x release of Gluster. > > Which OS are you using? If it's something rpm based > (RHEL/CentOS/Fedora), building your own RPMs from git > source is very easy: > > > http://www.gluster.org/community/documentation/index.php/CompilingRPMS > > You would build from the "release-3.4" branch, with the > resulting RPMs being drop in replacements for the existing > ones. > > Does that help? >Sadly this won't help, but thanks for your effort. We use the Ubuntu repository for Gluster, so sadly this is no option. Also I don't think I'd be too happy with Packages built from Git on a production system. And neither would my boss. ;) So I guess we'll have to wait until the next 3.4.0 release. From your experience, what would your estimates be when the next release would come out?> Regards and best wishes, > > Justin Clift > > -- > Justin Clift