Ernie Dunbar
2016-Mar-30 21:01 UTC
[Gluster-users] Gluster server crashes with signal 11 after probing peers.
Hi everyone. I'm trying to add a new Gluster node to our cluster, and when trying to probing the first node in the cluster, the new node crashes with the following report (logs start when the daemon starts): --------- [2016-03-30 20:32:05.191659] I [MSGID: 100030] [glusterfsd.c:2332:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.7.9 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid) [2016-03-30 20:32:05.195695] I [MSGID: 106478] [glusterd.c:1337:init] 0-management: Maximum allowed open file descriptors set to 65536 [2016-03-30 20:32:05.195752] I [MSGID: 106479] [glusterd.c:1386:init] 0-management: Using /var/lib/glusterd as working directory [2016-03-30 20:32:05.200609] W [MSGID: 103071] [rdma.c:4594:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed [No such device] [2016-03-30 20:32:05.200648] W [MSGID: 103055] [rdma.c:4901:init] 0-rdma.management: Failed to initialize IB Device [2016-03-30 20:32:05.200662] W [rpc-transport.c:359:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed [2016-03-30 20:32:05.200723] W [rpcsvc.c:1597:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed [2016-03-30 20:32:05.200743] E [MSGID: 106243] [glusterd.c:1610:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport [2016-03-30 20:32:07.135310] I [MSGID: 106513] [glusterd-store.c:2062:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 30501 [2016-03-30 20:32:07.135775] I [MSGID: 106498] [glusterd-handler.c:3640:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2016-03-30 20:32:07.135876] I [rpc-clnt.c:984:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2016-03-30 20:32:07.136651] W [socket.c:870:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 13, Invalid argument [2016-03-30 20:32:07.136673] E [socket.c:2966:socket_connect] 0-management: Failed to set keep-alive: Invalid argument [2016-03-30 20:32:07.136908] I [MSGID: 106194] [glusterd-store.c:3523:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list. Final graph: +------------------------------------------------------------------------------+ 1: volume management 2: type mgmt/glusterd 3: option rpc-auth.auth-glusterfs on 4: option rpc-auth.auth-unix on 5: option rpc-auth.auth-null on 6: option rpc-auth-allow-insecure on 7: option transport.socket.listen-backlog 128 8: option event-threads 1 9: option ping-timeout 0 10: option transport.socket.read-fail-log off 11: option transport.socket.keepalive-interval 2 12: option transport.socket.keepalive-time 10 13: option transport-type rdma 14: option working-directory /var/lib/glusterd 15: end-volume 16: +------------------------------------------------------------------------------+ [2016-03-30 20:32:07.138287] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2016-03-30 20:32:07.138980] I [MSGID: 106544] [glusterd.c:159:glusterd_uuid_init] 0-management: retrieved UUID: ae191e96-9cd6-4e2b-acae-18f2cc45e6ed [2016-03-30 20:32:07.139422] I [MSGID: 106163] [glusterd-handshake.c:1194:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30501 [2016-03-30 20:32:14.394056] I [MSGID: 106487] [glusterd-handler.c:1239:__glusterd_handle_cli_probe] 0-glusterd: Received CLI probe req nfs1 24007 pending frames: frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2016-03-30 20:32:14 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7.9 /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x92)[0x7f0401a78562] /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f0401a9464d] /lib/x86_64-linux-gnu/libc.so.6(+0x36d40)[0x7f0400e76d40] /lib/x86_64-linux-gnu/libpthread.so.0(pthread_spin_lock+0x0)[0x7f04012120f0] --------- Both nodes are running GlusterFS 3.7.9 on Ubuntu Trusty Tahr (14.04 LTS). Node 1 is running Linux kernel 3.13.0-55-generic #94-Ubuntu SMP, and node 3 is running Linux kernel 3.13.0-77-generic #121-Ubuntu SMP. To me, this seems to be the only difference between the systems, although the new node has the very latest version of the Gluster packages from the launchpad.net PPA. I would imagine that Node 1 has the same update, but it's hard to tell. Any help would be much appreciated.
Mohammed Rafi K C
2016-Mar-31 06:15 UTC
[Gluster-users] Gluster server crashes with signal 11 after probing peers.
Hi Ernie, Can you please paste the back trace from the core file. Regards Rafi KC On 03/31/2016 02:31 AM, Ernie Dunbar wrote:> Hi everyone. > > I'm trying to add a new Gluster node to our cluster, and when trying > to probing the first node in the cluster, the new node crashes with > the following report (logs start when the daemon starts): > > --------- > [2016-03-30 20:32:05.191659] I [MSGID: 100030] > [glusterfsd.c:2332:main] 0-/usr/sbin/glusterd: Started running > /usr/sbin/glusterd version 3.7.9 (args: /usr/sbin/glusterd -p > /var/run/glusterd.pid) > [2016-03-30 20:32:05.195695] I [MSGID: 106478] [glusterd.c:1337:init] > 0-management: Maximum allowed open file descriptors set to 65536 > [2016-03-30 20:32:05.195752] I [MSGID: 106479] [glusterd.c:1386:init] > 0-management: Using /var/lib/glusterd as working directory > [2016-03-30 20:32:05.200609] W [MSGID: 103071] > [rdma.c:4594:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event > channel creation failed [No such device] > [2016-03-30 20:32:05.200648] W [MSGID: 103055] [rdma.c:4901:init] > 0-rdma.management: Failed to initialize IB Device > [2016-03-30 20:32:05.200662] W > [rpc-transport.c:359:rpc_transport_load] 0-rpc-transport: 'rdma' > initialization failed > [2016-03-30 20:32:05.200723] W [rpcsvc.c:1597:rpcsvc_transport_create] > 0-rpc-service: cannot create listener, initing the transport failed > [2016-03-30 20:32:05.200743] E [MSGID: 106243] [glusterd.c:1610:init] > 0-management: creation of 1 listeners failed, continuing with > succeeded transport > [2016-03-30 20:32:07.135310] I [MSGID: 106513] > [glusterd-store.c:2062:glusterd_restore_op_version] 0-glusterd: > retrieved op-version: 30501 > [2016-03-30 20:32:07.135775] I [MSGID: 106498] > [glusterd-handler.c:3640:glusterd_friend_add_from_peerinfo] > 0-management: connect returned 0 > [2016-03-30 20:32:07.135876] I > [rpc-clnt.c:984:rpc_clnt_connection_init] 0-management: setting > frame-timeout to 600 > [2016-03-30 20:32:07.136651] W [socket.c:870:__socket_keepalive] > 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 13, Invalid > argument > [2016-03-30 20:32:07.136673] E [socket.c:2966:socket_connect] > 0-management: Failed to set keep-alive: Invalid argument > [2016-03-30 20:32:07.136908] I [MSGID: 106194] > [glusterd-store.c:3523:glusterd_store_retrieve_missed_snaps_list] > 0-management: No missed snaps list. > Final graph: > +------------------------------------------------------------------------------+ > > 1: volume management > 2: type mgmt/glusterd > 3: option rpc-auth.auth-glusterfs on > 4: option rpc-auth.auth-unix on > 5: option rpc-auth.auth-null on > 6: option rpc-auth-allow-insecure on > 7: option transport.socket.listen-backlog 128 > 8: option event-threads 1 > 9: option ping-timeout 0 > 10: option transport.socket.read-fail-log off > 11: option transport.socket.keepalive-interval 2 > 12: option transport.socket.keepalive-time 10 > 13: option transport-type rdma > 14: option working-directory /var/lib/glusterd > 15: end-volume > 16: > +------------------------------------------------------------------------------+ > > [2016-03-30 20:32:07.138287] I [MSGID: 101190] > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started > thread with index 1 > [2016-03-30 20:32:07.138980] I [MSGID: 106544] > [glusterd.c:159:glusterd_uuid_init] 0-management: retrieved UUID: > ae191e96-9cd6-4e2b-acae-18f2cc45e6ed > [2016-03-30 20:32:07.139422] I [MSGID: 106163] > [glusterd-handshake.c:1194:__glusterd_mgmt_hndsk_versions_ack] > 0-management: using the op-version 30501 > [2016-03-30 20:32:14.394056] I [MSGID: 106487] > [glusterd-handler.c:1239:__glusterd_handle_cli_probe] 0-glusterd: > Received CLI probe req nfs1 24007 > pending frames: > frame : type(0) op(0) > patchset: git://git.gluster.com/glusterfs.git > signal received: 11 > time of crash: > 2016-03-30 20:32:14 > configuration details: > argp 1 > backtrace 1 > dlfcn 1 > libpthread 1 > llistxattr 1 > setfsid 1 > spinlock 1 > epoll.h 1 > xattr.h 1 > st_atim.tv_nsec 1 > package-string: glusterfs 3.7.9 > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x92)[0x7f0401a78562] > > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f0401a9464d] > > /lib/x86_64-linux-gnu/libc.so.6(+0x36d40)[0x7f0400e76d40] > /lib/x86_64-linux-gnu/libpthread.so.0(pthread_spin_lock+0x0)[0x7f04012120f0] > > --------- > > > Both nodes are running GlusterFS 3.7.9 on Ubuntu Trusty Tahr (14.04 > LTS). Node 1 is running Linux kernel 3.13.0-55-generic #94-Ubuntu SMP, > and node 3 is running Linux kernel 3.13.0-77-generic #121-Ubuntu SMP. > To me, this seems to be the only difference between the systems, > although the new node has the very latest version of the Gluster > packages from the launchpad.net PPA. I would imagine that Node 1 has > the same update, but it's hard to tell. > > Any help would be much appreciated. > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users