thr3ads.net - Gluster users - [Gluster-users] glusterd 3.7.6 crash [Apr 2016]

If this information is useful, please help other people find it:
Share via:

Steve Dainard

2016-Apr-04 17:10 UTC

[Gluster-users] glusterd 3.7.6 crash

1 of 6 glusterd daemons crashed over the weekend. Gluster 3.7.6 on Centos 7.
No core dump created after crash.

I was not the one who responded to the failure this morning, so I didn't
get a chance to see if the distributed volume 'storage' (which has a
brick
on all 6 nodes) was still operational, and its not entirely clear from the
logs if it was.

A systemctl restart glusterd brought the daemon back online, including the
3 replica volumes. If the distributed brick was offline, would it's startup
be logged here?

Apr 04 08:47:57 gluster01. systemd[1]: Starting GlusterFS, a clustered
file-system server...
-- Subject: Unit glusterd.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit glusterd.service has begun starting up.
Apr 04 08:47:58 gluster01. systemd[1]: Started GlusterFS, a clustered
file-system server.
-- Subject: Unit glusterd.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit glusterd.service has finished starting up.
-- 
-- The start-up result is done.
Apr 04 08:47:58 gluster01. polkitd[8795]: Unregistered Authentication Agent
for unix-process:6833:300817052 (system bus name :1.
Apr 04 08:47:58 gluster01. etc-glusterfs-glusterd.vol[6849]: [2016-04-04
15:47:58.284954] C [MSGID: 106003]
[glusterd-server-quorum.c:346:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume export-domain-storage.
Starting local bricks.
Apr 04 08:47:58 gluster01. etc-glusterfs-glusterd.vol[6849]: [2016-04-04
15:47:58.285842] C [MSGID: 106003]
[glusterd-server-quorum.c:346:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume iso-storage. Starting local
bricks.
Apr 04 08:47:58 gluster01. etc-glusterfs-glusterd.vol[6849]: [2016-04-04
15:47:58.286116] C [MSGID: 106003]
[glusterd-server-quorum.c:346:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume vm-storage. Starting local
bricks.

etc-glusterfs-glusterd.vol.log-20160403 attached:

pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash:
2016-04-02 20:45:01
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.6
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7fabd656c012]
/lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7fabd65884dd]
/lib64/libc.so.6(+0x35670)[0x7fabd4c59670]
/lib64/libc.so.6(gsignal+0x37)[0x7fabd4c595f7]
/lib64/libc.so.6(abort+0x148)[0x7fabd4c5ace8]
/lib64/libc.so.6(+0x75317)[0x7fabd4c99317]
/lib64/libc.so.6(+0x7b184)[0x7fabd4c9f184]
/lib64/libc.so.6(+0x7e961)[0x7fabd4ca2961]
/lib64/libc.so.6(__libc_malloc+0x4c)[0x7fabd4ca38dc]
/lib64/libc.so.6(xdr_string+0x125)[0x7fabd4d4d4f5]
/lib64/libgfxdr.so.0(xdr_gd1_mgmt_commit_op_rsp+0xdb)[0x7fabd611f07b]
/lib64/libgfxdr.so.0(xdr_to_generic+0x4b)[0x7fabd612037b]
/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(_gd_syncop_commit_op_cbk+0xa5)[0x7fabcb0ddc55]
/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_cbk+0x4c)[0x7fabcb08348c]
/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7fabd633ab80]
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1bf)[0x7fabd633ae3f]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fabd6336983]
/usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0x9506)[0x7fabc8523506]
/usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0xc3f4)[0x7fabc85263f4]
/lib64/libglusterfs.so.0(+0x878ea)[0x7fabd65cd8ea]
/lib64/libpthread.so.0(+0x7dc5)[0x7fabd53d4dc5]
/lib64/libc.so.6(clone+0x6d)[0x7fabd4d1a28d]


glustershd.log-20160403

[2016-04-02 20:45:02.594519] W [socket.c:588:__socket_rwv] 0-glusterfs:
readv on 127.0.0.1:24007 failed (No data available)
[2016-04-02 20:45:12.667862] E [socket.c:2278:socket_connect_finish]
0-glusterfs: connection to 127.0.0.1:24007 failed (Connection refused)
[2016-04-03 10:39:13.626344] W [rpc-clnt.c:1586:rpc_clnt_submit]
0-glusterfs: failed to submit rpc-request (XID: 0x3 Program: GlusterFS
Handshake, ProgVers: 2, Proc: 2) to rpc-transport (glusterfs)


Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160404/5710813f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: etc-glusterfs-glusterd.vol.log-20160403
Type: application/octet-stream
Size: 242463 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160404/5710813f/attachment.obj>

Atin Mukherjee

2016-Apr-05 08:57 UTC

head link

[Gluster-users] glusterd 3.7.6 crash

On 04/04/2016 10:40 PM, Steve Dainard wrote:> 1 of 6 glusterd daemons crashed over the weekend. Gluster 3.7.6 on Centos
7.
> No core dump created after crash.Without core dump it will be really difficult to debug the reason of the
crash. Have you checked all the location? Are you sure that core_pattern
is not been overwritten by ABRT?> 
> I was not the one who responded to the failure this morning, so I
didn't
> get a chance to see if the distributed volume 'storage' (which has
a
> brick on all 6 nodes) was still operational, and its not entirely clear
> from the logs if it was. 
> 
> A systemctl restart glusterd brought the daemon back online, including
> the 3 replica volumes. If the distributed brick was offline, would it's
> startup be logged here?
> 
> Apr 04 08:47:57 gluster01. systemd[1]: Starting GlusterFS, a clustered
> file-system server...
> -- Subject: Unit glusterd.service has begun start-up
> -- Defined-By: systemd
> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
> -- 
> -- Unit glusterd.service has begun starting up.
> Apr 04 08:47:58 gluster01. systemd[1]: Started GlusterFS, a clustered
> file-system server.
> -- Subject: Unit glusterd.service has finished start-up
> -- Defined-By: systemd
> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
> -- 
> -- Unit glusterd.service has finished starting up.
> -- 
> -- The start-up result is done.
> Apr 04 08:47:58 gluster01. polkitd[8795]: Unregistered Authentication
> Agent for unix-process:6833:300817052 (system bus name :1.
> Apr 04 08:47:58 gluster01. etc-glusterfs-glusterd.vol[6849]: [2016-04-04
> 15:47:58.284954] C [MSGID: 106003]
> [glusterd-server-quorum.c:346:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume export-domain-storage.
> Starting local bricks.
> Apr 04 08:47:58 gluster01. etc-glusterfs-glusterd.vol[6849]: [2016-04-04
> 15:47:58.285842] C [MSGID: 106003]
> [glusterd-server-quorum.c:346:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume iso-storage. Starting
> local bricks.
> Apr 04 08:47:58 gluster01. etc-glusterfs-glusterd.vol[6849]: [2016-04-04
> 15:47:58.286116] C [MSGID: 106003]
> [glusterd-server-quorum.c:346:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume vm-storage. Starting
> local bricks.
> 
> etc-glusterfs-glusterd.vol.log-20160403 attached:
> 
> pending frames:
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> patchset: git://git.gluster.com/glusterfs.git
> <http://git.gluster.com/glusterfs.git>
> signal received: 6
> time of crash: 
> 2016-04-02 20:45:01
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 3.7.6
> /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7fabd656c012]
> /lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7fabd65884dd]
> /lib64/libc.so.6(+0x35670)[0x7fabd4c59670]
> /lib64/libc.so.6(gsignal+0x37)[0x7fabd4c595f7]
> /lib64/libc.so.6(abort+0x148)[0x7fabd4c5ace8]
> /lib64/libc.so.6(+0x75317)[0x7fabd4c99317]
> /lib64/libc.so.6(+0x7b184)[0x7fabd4c9f184]
> /lib64/libc.so.6(+0x7e961)[0x7fabd4ca2961]
> /lib64/libc.so.6(__libc_malloc+0x4c)[0x7fabd4ca38dc]
> /lib64/libc.so.6(xdr_string+0x125)[0x7fabd4d4d4f5]
> /lib64/libgfxdr.so.0(xdr_gd1_mgmt_commit_op_rsp+0xdb)[0x7fabd611f07b]
> /lib64/libgfxdr.so.0(xdr_to_generic+0x4b)[0x7fabd612037b]
>
/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(_gd_syncop_commit_op_cbk+0xa5)[0x7fabcb0ddc55]
>
/usr/lib64/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_big_locked_cbk+0x4c)[0x7fabcb08348c]
> /lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7fabd633ab80]
> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x1bf)[0x7fabd633ae3f]
> /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fabd6336983]
> /usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0x9506)[0x7fabc8523506]
> /usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0xc3f4)[0x7fabc85263f4]
> /lib64/libglusterfs.so.0(+0x878ea)[0x7fabd65cd8ea]
> /lib64/libpthread.so.0(+0x7dc5)[0x7fabd53d4dc5]
> /lib64/libc.so.6(clone+0x6d)[0x7fabd4d1a28d]
>From the backtrace it looks like glusterd crashed while de-serializing
the response. Do you have a reproducer for this?> 
> 
> glustershd.log-20160403
> 
> [2016-04-02 20:45:02.594519] W [socket.c:588:__socket_rwv] 0-glusterfs:
> readv on 127.0.0.1:24007 <http://127.0.0.1:24007> failed (No data
available)
> [2016-04-02 20:45:12.667862] E [socket.c:2278:socket_connect_finish]
> 0-glusterfs: connection to 127.0.0.1:24007 <http://127.0.0.1:24007>
> failed (Connection refused)
> [2016-04-03 10:39:13.626344] W [rpc-clnt.c:1586:rpc_clnt_submit]
> 0-glusterfs: failed to submit rpc-request (XID: 0x3 Program: GlusterFS
> Handshake, ProgVers: 2, Proc: 2) to rpc-transport (glusterfs)
> 
> 
> Thanks
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>

Gluster users - Apr 2016 - glusterd 3.7.6 crash

[Gluster-users] glusterd 3.7.6 crash

[Gluster-users] glusterd 3.7.6 crash