On 04/04/2016 10:40 PM, Steve Dainard wrote:> 1 of 6 glusterd daemons crashed over the weekend. Gluster 3.7.6 on Centos
7.
> No core dump created after crash.
Without core dump it will be really difficult to debug the reason of the
crash. Have you checked all the location? Are you sure that core_pattern
is not been overwritten by ABRT?>
> I was not the one who responded to the failure this morning, so I
didn't
> get a chance to see if the distributed volume 'storage' (which has
a
> brick on all 6 nodes) was still operational, and its not entirely clear
> from the logs if it was.
>
> A systemctl restart glusterd brought the daemon back online, including
> the 3 replica volumes. If the distributed brick was offline, would it's
> startup be logged here?
>
> Apr 04 08:47:57 gluster01. systemd[1]: Starting GlusterFS, a clustered
> file-system server...
> -- Subject: Unit glusterd.service has begun start-up
> -- Defined-By: systemd
> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
> --
> -- Unit glusterd.service has begun starting up.
> Apr 04 08:47:58 gluster01. systemd[1]: Started GlusterFS, a clustered
> file-system server.
> -- Subject: Unit glusterd.service has finished start-up
> -- Defined-By: systemd
> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
> --
> -- Unit glusterd.service has finished starting up.
> --
> -- The start-up result is done.
> Apr 04 08:47:58 gluster01. polkitd[8795]: Unregistered Authentication
> Agent for unix-process:6833:300817052 (system bus name :1.
> Apr 04 08:47:58 gluster01. etc-glusterfs-glusterd.vol[6849]: [2016-04-04
> 15:47:58.284954] C [MSGID: 106003]
> [glusterd-server-quorum.c:346:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume export-domain-storage.
> Starting local bricks.
> Apr 04 08:47:58 gluster01. etc-glusterfs-glusterd.vol[6849]: [2016-04-04
> 15:47:58.285842] C [MSGID: 106003]
> [glusterd-server-quorum.c:346:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume iso-storage. Starting
> local bricks.
> Apr 04 08:47:58 gluster01. etc-glusterfs-glusterd.vol[6849]: [2016-04-04
> 15:47:58.286116] C [MSGID: 106003]
> [glusterd-server-quorum.c:346:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume vm-storage. Starting
> local bricks.
>
> etc-glusterfs-glusterd.vol.log-20160403 attached:
>
> pending frames:
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> patchset: git://git.gluster.com/glusterfs.git
> <http://git.gluster.com/glusterfs.git>
> signal received: 6
> time of crash:
> 2016-04-02 20:45:01
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 3.7.6
> /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7fabd656c012]
> /lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7fabd65884dd]
> /lib64/libc.so.6(+0x35670)[0x7fabd4c59670]
> /lib64/libc.so.6(gsignal+0x37)[0x7fabd4c595f7]
> /lib64/libc.so.6(abort+0x148)[0x7fabd4c5ace8]
> /lib64/libc.so.6(+0x75317)[0x7fabd4c99317]
> /lib64/libc.so.6(+0x7b184)[0x7fabd4c9f184]
> /lib64/libc.so.6(+0x7e961)[0x7fabd4ca2961]
> /lib64/libc.so.6(__libc_malloc+0x4c)[0x7fabd4ca38dc]
> /lib64/libc.so.6(xdr_string+0x125)[0x7fabd4d4d4f5]
> /lib64/libgfxdr.so.0(xdr_gd1_mgmt_commit_op_rsp+0xdb)[0x7fabd611f07b]
> /lib64/libgfxdr.so.0(xdr_to_generic+0x4b)[0x7fabd612037b]
>
/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(_gd_syncop_commit_op_cbk+0xa5)[0x7fabcb0ddc55]
>
/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_cbk+0x4c)[0x7fabcb08348c]
> /lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7fabd633ab80]
> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x1bf)[0x7fabd633ae3f]
> /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fabd6336983]
> /usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0x9506)[0x7fabc8523506]
> /usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0xc3f4)[0x7fabc85263f4]
> /lib64/libglusterfs.so.0(+0x878ea)[0x7fabd65cd8ea]
> /lib64/libpthread.so.0(+0x7dc5)[0x7fabd53d4dc5]
> /lib64/libc.so.6(clone+0x6d)[0x7fabd4d1a28d]
>From the backtrace it looks like glusterd crashed while de-serializing
the response. Do you have a reproducer for this?>
>
> glustershd.log-20160403
>
> [2016-04-02 20:45:02.594519] W [socket.c:588:__socket_rwv] 0-glusterfs:
> readv on 127.0.0.1:24007 <http://127.0.0.1:24007> failed (No data
available)
> [2016-04-02 20:45:12.667862] E [socket.c:2278:socket_connect_finish]
> 0-glusterfs: connection to 127.0.0.1:24007 <http://127.0.0.1:24007>
> failed (Connection refused)
> [2016-04-03 10:39:13.626344] W [rpc-clnt.c:1586:rpc_clnt_submit]
> 0-glusterfs: failed to submit rpc-request (XID: 0x3 Program: GlusterFS
> Handshake, ProgVers: 2, Proc: 2) to rpc-transport (glusterfs)
>
>
> Thanks
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>