thr3ads.net - Gluster users - [Gluster-users] Gluster server crashes with signal 11 after probing peers. [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Ernie Dunbar

2016-Mar-30 21:01 UTC

[Gluster-users] Gluster server crashes with signal 11 after probing peers.

Hi everyone.

I'm trying to add a new Gluster node to our cluster, and when trying to 
probing the first node in the cluster, the new node crashes with the 
following report (logs start when the daemon starts):

---------
[2016-03-30 20:32:05.191659] I [MSGID: 100030] [glusterfsd.c:2332:main] 
0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.7.9 
(args: /usr/sbin/glusterd -p /var/run/glusterd.pid)
[2016-03-30 20:32:05.195695] I [MSGID: 106478] [glusterd.c:1337:init] 
0-management: Maximum allowed open file descriptors set to 65536
[2016-03-30 20:32:05.195752] I [MSGID: 106479] [glusterd.c:1386:init] 
0-management: Using /var/lib/glusterd as working directory
[2016-03-30 20:32:05.200609] W [MSGID: 103071] 
[rdma.c:4594:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event 
channel creation failed [No such device]
[2016-03-30 20:32:05.200648] W [MSGID: 103055] [rdma.c:4901:init] 
0-rdma.management: Failed to initialize IB Device
[2016-03-30 20:32:05.200662] W [rpc-transport.c:359:rpc_transport_load] 
0-rpc-transport: 'rdma' initialization failed
[2016-03-30 20:32:05.200723] W [rpcsvc.c:1597:rpcsvc_transport_create] 
0-rpc-service: cannot create listener, initing the transport failed
[2016-03-30 20:32:05.200743] E [MSGID: 106243] [glusterd.c:1610:init] 
0-management: creation of 1 listeners failed, continuing with succeeded 
transport
[2016-03-30 20:32:07.135310] I [MSGID: 106513] 
[glusterd-store.c:2062:glusterd_restore_op_version] 0-glusterd: 
retrieved op-version: 30501
[2016-03-30 20:32:07.135775] I [MSGID: 106498] 
[glusterd-handler.c:3640:glusterd_friend_add_from_peerinfo] 
0-management: connect returned 0
[2016-03-30 20:32:07.135876] I [rpc-clnt.c:984:rpc_clnt_connection_init] 
0-management: setting frame-timeout to 600
[2016-03-30 20:32:07.136651] W [socket.c:870:__socket_keepalive] 
0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 13, Invalid 
argument
[2016-03-30 20:32:07.136673] E [socket.c:2966:socket_connect] 
0-management: Failed to set keep-alive: Invalid argument
[2016-03-30 20:32:07.136908] I [MSGID: 106194] 
[glusterd-store.c:3523:glusterd_store_retrieve_missed_snaps_list] 
0-management: No missed snaps list.
Final graph:
+------------------------------------------------------------------------------+
   1: volume management
   2:     type mgmt/glusterd
   3:     option rpc-auth.auth-glusterfs on
   4:     option rpc-auth.auth-unix on
   5:     option rpc-auth.auth-null on
   6:     option rpc-auth-allow-insecure on
   7:     option transport.socket.listen-backlog 128
   8:     option event-threads 1
   9:     option ping-timeout 0
  10:     option transport.socket.read-fail-log off
  11:     option transport.socket.keepalive-interval 2
  12:     option transport.socket.keepalive-time 10
  13:     option transport-type rdma
  14:     option working-directory /var/lib/glusterd
  15: end-volume
  16:
+------------------------------------------------------------------------------+
[2016-03-30 20:32:07.138287] I [MSGID: 101190] 
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread 
with index 1
[2016-03-30 20:32:07.138980] I [MSGID: 106544] 
[glusterd.c:159:glusterd_uuid_init] 0-management: retrieved UUID: 
ae191e96-9cd6-4e2b-acae-18f2cc45e6ed
[2016-03-30 20:32:07.139422] I [MSGID: 106163] 
[glusterd-handshake.c:1194:__glusterd_mgmt_hndsk_versions_ack] 
0-management: using the op-version 30501
[2016-03-30 20:32:14.394056] I [MSGID: 106487] 
[glusterd-handler.c:1239:__glusterd_handle_cli_probe] 0-glusterd: 
Received CLI probe req nfs1 24007
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2016-03-30 20:32:14
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.9
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x92)[0x7f0401a78562]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f0401a9464d]
/lib/x86_64-linux-gnu/libc.so.6(+0x36d40)[0x7f0400e76d40]
/lib/x86_64-linux-gnu/libpthread.so.0(pthread_spin_lock+0x0)[0x7f04012120f0]
---------


Both nodes are running GlusterFS 3.7.9 on Ubuntu Trusty Tahr (14.04 
LTS). Node 1 is running Linux kernel 3.13.0-55-generic #94-Ubuntu SMP, 
and node 3 is running Linux kernel 3.13.0-77-generic #121-Ubuntu SMP. To 
me, this seems to be the only difference between the systems, although 
the new node has the very latest version of the Gluster packages from 
the launchpad.net PPA. I would imagine that Node 1 has the same update, 
but it's hard to tell.

Any help would be much appreciated.

Mohammed Rafi K C

2016-Mar-31 06:15 UTC

head link

[Gluster-users] Gluster server crashes with signal 11 after probing peers.

Hi Ernie,

Can you please paste the back trace from the core file.

Regards
Rafi KC

On 03/31/2016 02:31 AM, Ernie Dunbar wrote:> Hi everyone.
>
> I'm trying to add a new Gluster node to our cluster, and when trying
> to probing the first node in the cluster, the new node crashes with
> the following report (logs start when the daemon starts):
>
> ---------
> [2016-03-30 20:32:05.191659] I [MSGID: 100030]
> [glusterfsd.c:2332:main] 0-/usr/sbin/glusterd: Started running
> /usr/sbin/glusterd version 3.7.9 (args: /usr/sbin/glusterd -p
> /var/run/glusterd.pid)
> [2016-03-30 20:32:05.195695] I [MSGID: 106478] [glusterd.c:1337:init]
> 0-management: Maximum allowed open file descriptors set to 65536
> [2016-03-30 20:32:05.195752] I [MSGID: 106479] [glusterd.c:1386:init]
> 0-management: Using /var/lib/glusterd as working directory
> [2016-03-30 20:32:05.200609] W [MSGID: 103071]
> [rdma.c:4594:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
> channel creation failed [No such device]
> [2016-03-30 20:32:05.200648] W [MSGID: 103055] [rdma.c:4901:init]
> 0-rdma.management: Failed to initialize IB Device
> [2016-03-30 20:32:05.200662] W
> [rpc-transport.c:359:rpc_transport_load] 0-rpc-transport: 'rdma'
> initialization failed
> [2016-03-30 20:32:05.200723] W [rpcsvc.c:1597:rpcsvc_transport_create]
> 0-rpc-service: cannot create listener, initing the transport failed
> [2016-03-30 20:32:05.200743] E [MSGID: 106243] [glusterd.c:1610:init]
> 0-management: creation of 1 listeners failed, continuing with
> succeeded transport
> [2016-03-30 20:32:07.135310] I [MSGID: 106513]
> [glusterd-store.c:2062:glusterd_restore_op_version] 0-glusterd:
> retrieved op-version: 30501
> [2016-03-30 20:32:07.135775] I [MSGID: 106498]
> [glusterd-handler.c:3640:glusterd_friend_add_from_peerinfo]
> 0-management: connect returned 0
> [2016-03-30 20:32:07.135876] I
> [rpc-clnt.c:984:rpc_clnt_connection_init] 0-management: setting
> frame-timeout to 600
> [2016-03-30 20:32:07.136651] W [socket.c:870:__socket_keepalive]
> 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 13, Invalid
> argument
> [2016-03-30 20:32:07.136673] E [socket.c:2966:socket_connect]
> 0-management: Failed to set keep-alive: Invalid argument
> [2016-03-30 20:32:07.136908] I [MSGID: 106194]
> [glusterd-store.c:3523:glusterd_store_retrieve_missed_snaps_list]
> 0-management: No missed snaps list.
> Final graph:
>
+------------------------------------------------------------------------------+
>
>   1: volume management
>   2:     type mgmt/glusterd
>   3:     option rpc-auth.auth-glusterfs on
>   4:     option rpc-auth.auth-unix on
>   5:     option rpc-auth.auth-null on
>   6:     option rpc-auth-allow-insecure on
>   7:     option transport.socket.listen-backlog 128
>   8:     option event-threads 1
>   9:     option ping-timeout 0
>  10:     option transport.socket.read-fail-log off
>  11:     option transport.socket.keepalive-interval 2
>  12:     option transport.socket.keepalive-time 10
>  13:     option transport-type rdma
>  14:     option working-directory /var/lib/glusterd
>  15: end-volume
>  16:
>
+------------------------------------------------------------------------------+
>
> [2016-03-30 20:32:07.138287] I [MSGID: 101190]
> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
> thread with index 1
> [2016-03-30 20:32:07.138980] I [MSGID: 106544]
> [glusterd.c:159:glusterd_uuid_init] 0-management: retrieved UUID:
> ae191e96-9cd6-4e2b-acae-18f2cc45e6ed
> [2016-03-30 20:32:07.139422] I [MSGID: 106163]
> [glusterd-handshake.c:1194:__glusterd_mgmt_hndsk_versions_ack]
> 0-management: using the op-version 30501
> [2016-03-30 20:32:14.394056] I [MSGID: 106487]
> [glusterd-handler.c:1239:__glusterd_handle_cli_probe] 0-glusterd:
> Received CLI probe req nfs1 24007
> pending frames:
> frame : type(0) op(0)
> patchset: git://git.gluster.com/glusterfs.git
> signal received: 11
> time of crash:
> 2016-03-30 20:32:14
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 3.7.9
>
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x92)[0x7f0401a78562]
>
>
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f0401a9464d]
>
> /lib/x86_64-linux-gnu/libc.so.6(+0x36d40)[0x7f0400e76d40]
>
/lib/x86_64-linux-gnu/libpthread.so.0(pthread_spin_lock+0x0)[0x7f04012120f0]
>
> ---------
>
>
> Both nodes are running GlusterFS 3.7.9 on Ubuntu Trusty Tahr (14.04
> LTS). Node 1 is running Linux kernel 3.13.0-55-generic #94-Ubuntu SMP,
> and node 3 is running Linux kernel 3.13.0-77-generic #121-Ubuntu SMP.
> To me, this seems to be the only difference between the systems,
> although the new node has the very latest version of the Gluster
> packages from the launchpad.net PPA. I would imagine that Node 1 has
> the same update, but it's hard to tell.
>
> Any help would be much appreciated.
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

Gluster users - Mar 2016 - Gluster server crashes with signal 11 after probing peers.

[Gluster-users] Gluster server crashes with signal 11 after probing peers.

[Gluster-users] Gluster server crashes with signal 11 after probing peers.