Matthew Wade
2016-Aug-26 09:16 UTC
[Gluster-users] glusterd daemon dead, but glusterfsd still running
HI, We are currently running a three node cluster, on Gluster 3.6.4. On one of our nodes we noticed that the glusterd daemon is dead. But the glusterfsd daemons are still running, and we believe clients are connecting and retrieving data We noticed that the daemon has been dead for a week, and we didn't see it. We would like to know are we safe to just go ahead and start the glusterd service again. If so would this trigger a self-heal on all volumes? As this would cause a performance issue. The logs for this node is as follows:: [2016-08-19 18:01:52.804453] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f4f3ffca550] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7f4f3fd9f787] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f4f3fd9f89e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7f4f3fd9f951] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7f4f3fd9ff1f] ))))) 0-DAOS-client-4: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2016-08-19 18:01:51.886737 (xid=0x144a1d) [2016-08-19 18:01:52.804480] W [client-handshake.c:1588:client_dump_version_cbk] 0-DAOS-client-4: received RPC status error [2016-08-19 18:01:52.804504] W [socket.c:620:__socket_rwv] 0-glusterfs: readv on 127.0.0.1:24007 failed (No data available) [2016-08-19 18:02:02.900863] E [socket.c:2276:socket_connect_finish] 0-glusterfs: connection to 127.0.0.1:24007 failed (Connection refused) If we aren't safe to do so, what else should we do to resolve this? Matt Wade IT Operations Analyst IOP Publishing Temple Circus, Temple Way, Bristol BS1 6HG, UK Direct line +44 (0)117 930 1136 ioppublishing.org This email (and attachments) are confidential and intended for the addressee(s) only. If you are not the intended recipient please notify the sender, delete any copies and do not take action in reliance on it. Any views expressed are the author's and do not represent those of IOP, except where specifically stated. IOP takes reasonable precautions to protect against viruses but accepts no responsibility for loss or damage arising from virus infection. For the protection of IOP's systems and staff emails are scanned automatically. Institute of Physics. Registered charity no. 293851 (England & Wales) and SCO40092 (Scotland) Registered Office: 76 Portland Place, London W1B 1NT -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160826/45ae34b4/attachment.html>
Atin Mukherjee
2016-Aug-28 16:50 UTC
[Gluster-users] glusterd daemon dead, but glusterfsd still running
On Friday 26 August 2016, Matthew Wade <matthew.wade at iop.org> wrote:> > > HI, > > We are currently running a three node cluster, on Gluster 3.6.4. > > On one of our nodes we noticed that the glusterd daemon is dead. > > But the glusterfsd daemons are still running, and we believe clients are > connecting and retrieving data > > We noticed that the daemon has been dead for a week, and we didn't see it. > > We would like to know are we safe to just go ahead and start the glusterd > service again. >It can be started safely as mgmt & i/o path do not interfere with each other.> > > If so would this trigger a self-heal on all volumes? As this would cause a > performance issue. >Why would you need a self heal trigger if all of your bricks w(a)are running.> > > The logs for this node is as follows:: > > [2016-08-19 18:01:52.804453] E [rpc-clnt.c:362:saved_frames_unwind] (--> > /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f4f3ffca550] > (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7f4f3fd9f787] > (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f4f3fd9f89e] > (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7f4f3fd9f951] > (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7f4f3fd9ff1f] > ))))) 0-DAOS-client-4: forced unwinding frame type(GF-DUMP) op(DUMP(1)) > called at 2016-08-19 18:01:51.886737 (xid=0x144a1d) > [2016-08-19 18:01:52.804480] W [client-handshake.c:1588:client_dump_version_cbk] > 0-DAOS-client-4: received RPC status error > [2016-08-19 18:01:52.804504] W [socket.c:620:__socket_rwv] 0-glusterfs: > readv on 127.0.0.1:24007 failed (No data available) > [2016-08-19 18:02:02.900863] E [socket.c:2276:socket_connect_finish] > 0-glusterfs: connection to 127.0.0.1:24007 failed (Connection refused) > > If we aren't safe to do so, what else should we do to resolve this? > > *Matt Wade* > IT Operations Analyst > > IOP Publishing > Temple Circus, Temple Way, Bristol > BS1 6HG, UK > > Direct line +44 (0)117 930 1136 > > ioppublishing.org > > ------------------------------ > This email (and attachments) are confidential and intended for the > addressee(s) only. If you are not the intended recipient please immediately > notify the sender, permanently and securely delete any copies and do not > take action with it or in reliance on it. Any views expressed are the > author's and do not represent those of IOP, except where specifically > stated. IOP takes reasonable precautions to protect against viruses but > accepts no responsibility for loss or damage arising from virus infection. > For the protection of IOP's systems and staff emails are scanned > automatically.. > > Institute of Physics. Registered charity no. 293851 (England & Wales) and > SCO40092 (Scotland) > Registered Office: 76 Portland Place, London W1B 1NT > ------------------------------ >-- --Atin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160828/4cd62e80/attachment.html>