thr3ads.net - Gluster users - [Gluster-users] Losing connection to bricks, Gluster processes restarting [Jan 2016]

If this information is useful, please help other people find it:
Share via:

Logan Barfield

2016-Jan-29 18:29 UTC

[Gluster-users] Losing connection to bricks, Gluster processes restarting

We're running a fairly large 2-replica volume across two servers.  The
volume is approximately 20TB of small 1K-4MB files.  The volume is exported
via NFS, and mounted remotely by two clients.

For the past few weeks the Gluster brick processes have been randomly
restarting.  Luckily they've been doing so at non-peak times, so we
didn't
notice until our monitoring checks happened to pick up on zombied
'glusterfs' process.
>From the logs it looks like something is blocking communication to thebrick processes, and Gluster automatically restarts everything to
compensate.  I've so far not been able to figure out the underlying cause.

I've included log snippets from 'glustershd.log' and
'etc-glusterfs-glusterd.vol.log' here.  If anyone can provide some
insight
into the issue it would be greatly appreciated.  I'll also be happy to
provide any further details as needed.


[2016-01-29 05:03:47.039886] I [MSGID: 106144]
[glusterd-pmap.c:274:pmap_registry_remove] 0-pmap: removing brick
/export/data/brick02 on port 49155
[2016-01-29 05:03:47.075521] W [socket.c:588:__socket_rwv] 0-management:
readv on /var/run/gluster/53a233b05f5d4be45dc94391bc3ebfe5.socket failed
(No data available)
[2016-01-29 05:03:47.078282] I [MSGID: 106005]
[glusterd-handler.c:4908:__glusterd_brick_rpc_notify] 0-management: Brick
gluster-stor02:/export/data/brick02 has disconnected from glusterd.
[2016-01-29 05:03:47.149161] W [glusterfsd.c:1236:cleanup_and_exit]
(-->/lib64/libpthread.so.0() [0x3e47a079d1]
-->/usr/sbin/glusterd(glusterfs_sigwaiter+0xcd) [0x405e6d]
-->/usr/sbin/glusterd(cleanup_and_exit+0x65) [0x4059d5] ) 0-: recei
ved signum (15), shutting down
[2016-01-29 05:03:54.067012] I [MSGID: 100030] [glusterfsd.c:2318:main]
0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.7.6
(args: /usr/sbin/glusterd --pid-file=/var/run/glusterd.pid)
[2016-01-29 05:03:54.071901] I [MSGID: 106478] [glusterd.c:1350:init]
0-management: Maximum allowed open file descriptors set to 65536
[2016-01-29 05:03:54.071935] I [MSGID: 106479] [glusterd.c:1399:init]
0-management: Using /var/lib/glusterd as working directory
[2016-01-29 05:03:54.075655] E [rpc-transport.c:292:rpc_transport_load]
0-rpc-transport: /usr/lib64/glusterfs/3.7.6/rpc-transport/rdma.so: cannot
open shared object file: No such file or directory
[2016-01-29 05:03:54.075672] W [rpc-transport.c:296:rpc_transport_load]
0-rpc-transport: volume 'rdma.management': transport-type 'rdma'
is not
valid or not found on this machine
[2016-01-29 05:03:54.075680] W [rpcsvc.c:1597:rpcsvc_transport_create]
0-rpc-service: cannot create listener, initing the transport failed
[2016-01-29 05:03:54.075687] E [MSGID: 106243] [glusterd.c:1623:init]
0-management: creation of 1 listeners failed, continuing with succeeded
transport
[2016-01-29 05:03:55.869717] I [MSGID: 106513]
[glusterd-store.c:2047:glusterd_restore_op_version] 0-glusterd: retrieved
op-version: 30702
[2016-01-29 05:03:55.995747] I [MSGID: 106498]
[glusterd-handler.c:3579:glusterd_friend_add_from_peerinfo] 0-management:
connect returned 0
[2016-01-29 05:03:55.995866] I [rpc-clnt.c:984:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
[2016-01-29 05:03:56.000937] I [MSGID: 106544]
[glusterd.c:159:glusterd_uuid_init] 0-management: retrieved UUID:
9b103ea8-d248-44fc-8f80-3e87f7c4971c
Final graph:
+------------------------------------------------------------------------------+
  1: volume management
  2:     type mgmt/glusterd
  3:     option rpc-auth.auth-glusterfs on
  4:     option rpc-auth.auth-unix on
  5:     option rpc-auth.auth-null on
  6:     option rpc-auth-allow-insecure on
  7:     option transport.socket.listen-backlog 128
  8:     option ping-timeout 30
  9:     option transport.socket.read-fail-log off
 10:     option transport.socket.keepalive-interval 2
 11:     option transport.socket.keepalive-time 10
 12:     option transport-type rdma
 13:     option working-directory /var/lib/glusterd
 14: end-volume
 15:
+------------------------------------------------------------------------------+
[2016-01-29 05:03:56.002570] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 2
[2016-01-29 05:03:56.003098] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2016-01-29 05:03:56.003158] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 2
[2016-01-29 05:03:56.855628] I [MSGID: 106493]
[glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd: Received ACC
from uuid: 388a8bb4-c530-44ff-838b-8f7b9e4c95db, host: 10.1.1.10, port: 0
[2016-01-29 05:03:56.856787] I [rpc-clnt.c:984:rpc_clnt_connection_init]
0-nfs: setting frame-timeout to 600
[2016-01-29 05:03:57.859093] I [MSGID: 106540]
[glusterd-utils.c:4191:glusterd_nfs_pmap_deregister] 0-glusterd:
De-registered MOUNTV3 successfully
[2016-01-29 05:03:57.860228] I [MSGID: 106540]
[glusterd-utils.c:4200:glusterd_nfs_pmap_deregister] 0-glusterd:
De-registered MOUNTV1 successfully
[2016-01-29 05:03:57.861329] I [MSGID: 106540]
[glusterd-utils.c:4209:glusterd_nfs_pmap_deregister] 0-glusterd:
De-registered NFSV3 successfully
[2016-01-29 05:03:57.862421] I [MSGID: 106540]
[glusterd-utils.c:4218:glusterd_nfs_pmap_deregister] 0-glusterd:
De-registered NLM v4 successfully
[2016-01-29 05:03:57.863510] I [MSGID: 106540]
[glusterd-utils.c:4227:glusterd_nfs_pmap_deregister] 0-glusterd:
De-registered NLM v1 successfully
[2016-01-29 05:03:57.864600] I [MSGID: 106540]
[glusterd-utils.c:4236:glusterd_nfs_pmap_deregister] 0-glusterd:
De-registered ACL v3 successfully
[2016-01-29 05:03:57.870948] W [socket.c:3009:socket_connect] 0-nfs: Ignore
failed connection attempt on , (No such file or directory)







[2016-01-29 05:03:47.075614] W [socket.c:588:__socket_rwv]
0-data02-client-1: readv on 10.1.1.10:49155 failed (No data available)
[2016-01-29 05:03:47.076871] I [MSGID: 114018]
[client.c:2042:client_rpc_notify] 0-data02-client-1: disconnected from
data02-client-1. Client process will keep trying to connect to glusterd
until brick's port is available
[2016-01-29 05:03:47.170284] W [socket.c:588:__socket_rwv] 0-glusterfs:
readv on 127.0.0.1:24007 failed (No data available)
[2016-01-29 05:03:47.639163] W [socket.c:588:__socket_rwv]
0-data02-client-0: readv on 10.1.1.11:49153 failed (No data available)
[2016-01-29 05:03:47.639206] I [MSGID: 114018]
[client.c:2042:client_rpc_notify] 0-data02-client-0: disconnected from
data02-client-0. Client process will keep trying to connect to glusterd
until brick's port is available
[2016-01-29 05:03:47.640222] E [MSGID: 108006]
[afr-common.c:3880:afr_notify] 0-data02-replicate-0: All subvolumes are
down. Going offline until atleast one of them comes back up.
[2016-01-29 05:03:57.872983] W [glusterfsd.c:1236:cleanup_and_exit]
(-->/lib64/libpthread.so.0() [0x3e47a079d1]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e6d]
-->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059d5] ) 0-: rec
eived signum (15), shutting down
[2016-01-29 05:03:58.881541] I [MSGID: 100030] [glusterfsd.c:2318:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.6
(args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/lib/glusterd/
glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S
/var/run/gluster/8d72de580ccac07d2ecfc2491a9b1648.socket --xlator-option
*replicate*.node-uuid=9b103ea8-d248-44fc-8f80-3e87f7c4971c)
[2016-01-29 05:03:58.890833] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2016-01-29 05:03:59.340030] I [graph.c:269:gf_add_cmdline_options]
0-data02-replicate-0: adding option 'node-uuid' for volume
'data02-replicate-0' with value
'9b103ea8-d248-44fc-8f80-3e87f7c4971c'
[2016-01-29 05:03:59.342682] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 2
[2016-01-29 05:03:59.342742] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 3
[2016-01-29 05:03:59.342827] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 4
[2016-01-29 05:03:59.342892] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 5
[2016-01-29 05:03:59.342917] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 6
[2016-01-29 05:03:59.343563] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 8
[2016-01-29 05:03:59.343569] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 7
[2016-01-29 05:03:59.343657] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 9
[2016-01-29 05:03:59.343705] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 11
[2016-01-29 05:03:59.343710] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 10
[2016-01-29 05:03:59.344278] I [MSGID: 114020] [client.c:2118:notify]
0-data02-client-0: parent translators are ready, attempting connect on
transport
[2016-01-29 05:03:59.346553] I [MSGID: 114020] [client.c:2118:notify]
0-data02-client-1: parent translators are ready, attempting connect on
transport
Final graph:
+------------------------------------------------------------------------------+
  1: volume data02-client-0
  2:     type protocol/client
  3:     option ping-timeout 42
  4:     option remote-host gluster-stor01
  5:     option remote-subvolume /export/data/brick02
  6:     option transport-type socket
  7:     option username 5cc4f5d1-bcc8-4e06-ac74-520b20e2b452
  8:     option password 66b85782-5833-4f2d-ad0e-8de75247b094F
  9:     option event-threads 11
 10: end-volume
 11:
 12: volume data02-client-1
 13:     type protocol/client
 14:     option ping-timeout 42
 15:     option remote-host gluster-stor02
 16:     option remote-subvolume /export/data/brick02
 17:     option transport-type socket
 18:     option username 5cc4f5d1-bcc8-4e06-ac74-520b20e2b452
 19:     option password 66b85782-5833-4f2d-ad0e-8de75247b094
 20:     option event-threads 11
 21: end-volume
 22:
 23: volume data02-replicate-0
 24:     type cluster/replicate
 25:     option node-uuid 9b103ea8-d248-44fc-8f80-3e87f7c4971c
 26:     option background-self-heal-count 0
 27:     option metadata-self-heal on
 28:     option data-self-heal on
 29:     option entry-self-heal on
 30:     option self-heal-daemon enable
 31:     option iam-self-heal-daemon yes
 32:     subvolumes data02-client-0 data02-client-1
 33: end-volume
 34:
 35: volume glustershd
 36:     type debug/io-stats
 37:     subvolumes data02-replicate-0
 38: end-volume
 39:
+------------------------------------------------------------------------------+
[2016-01-29 05:03:59.348913] E [MSGID: 114058]
[client-handshake.c:1524:client_query_portmap_cbk] 0-data02-client-1:
failed to get the port number for remote subvolume. Please run 'gluster
volume status' on server to see if brick process
is running.
[2016-01-29 05:03:59.348960] I [MSGID: 114018]
[client.c:2042:client_rpc_notify] 0-data02-client-1: disconnected from
data02-client-1. Client process will keep trying to connect to glusterd
until brick's port is available
[2016-01-29 05:03:59.436909] E [MSGID: 114058]
[client-handshake.c:1524:client_query_portmap_cbk] 0-data02-client-0:
failed to get the port number for remote subvolume. Please run 'gluster
volume status' on server to see if brick process
is running.
[2016-01-29 05:03:59.436974] I [MSGID: 114018]
[client.c:2042:client_rpc_notify] 0-data02-client-0: disconnected from
data02-client-0. Client process will keep trying to connect to glusterd
until brick's port is available
[2016-01-29 05:03:59.436991] E [MSGID: 108006]
[afr-common.c:3880:afr_notify] 0-data02-replicate-0: All subvolumes are
down. Going offline until atleast one of them comes back up.
[2016-01-29 05:04:02.886317] I [rpc-clnt.c:1847:rpc_clnt_reconfig]
0-data02-client-0: changing port to 49153 (from 0)
[2016-01-29 05:04:02.888761] I [rpc-clnt.c:1847:rpc_clnt_reconfig]
0-data02-client-1: changing port to 49155 (from 0)
[2016-01-29 05:04:02.891105] I [MSGID: 114057]
[client-handshake.c:1437:select_server_supported_programs]
0-data02-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-01-29 05:04:02.891360] I [MSGID: 114046]
[client-handshake.c:1213:client_setvolume_cbk] 0-data02-client-0: Connected
to data02-client-0, attached to remote volume '/export/data/brick02'.
[2016-01-29 05:04:02.891373] I [MSGID: 114047]
[client-handshake.c:1224:client_setvolume_cbk] 0-data02-client-0: Server
and Client lk-version numbers are not same, reopening the fds
[2016-01-29 05:04:02.891403] I [MSGID: 108005]
[afr-common.c:3841:afr_notify] 0-data02-replicate-0: Subvolume
'data02-client-0' came back up; going online.
[2016-01-29 05:04:02.891518] I [MSGID: 114035]
[client-handshake.c:193:client_set_lk_version_cbk] 0-data02-client-0:
Server lk version = 1
[2016-01-29 05:04:02.893074] I [MSGID: 114057]
[client-handshake.c:1437:select_server_supported_programs]
0-data02-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-01-29 05:04:02.893251] I [MSGID: 114046]
[client-handshake.c:1213:client_setvolume_cbk] 0-data02-client-1: Connected
to data02-client-1, attached to remote volume '/export/data/brick02'.
[2016-01-29 05:04:02.893276] I [MSGID: 114047]
[client-handshake.c:1224:client_setvolume_cbk] 0-data02-client-1: Server
and Client lk-version numbers are not same, reopening the fds
[2016-01-29 05:04:02.893401] I [MSGID: 114035]
[client-handshake.c:193:client_set_lk_version_cbk] 0-data02-client-1:
Server lk version = 1
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160129/56123729/attachment.html>

Atin Mukherjee

2016-Jan-30 02:28 UTC

head link

[Gluster-users] [Gluster-devel] Losing connection to bricks, Gluster processes restarting

Could you paste output of gluster volume info?

~Atin

On 01/29/2016 11:59 PM, Logan Barfield wrote:> We're running a fairly large 2-replica volume across two servers.  The
> volume is approximately 20TB of small 1K-4MB files.  The volume is
> exported via NFS, and mounted remotely by two clients.
> 
> For the past few weeks the Gluster brick processes have been randomly
> restarting.  Luckily they've been doing so at non-peak times, so we
> didn't notice until our monitoring checks happened to pick up on
zombied
> 'glusterfs' process.
> 
> From the logs it looks like something is blocking communication to the
> brick processes, and Gluster automatically restarts everything to
> compensate.  I've so far not been able to figure out the underlying
cause.
> 
> I've included log snippets from 'glustershd.log' and
> 'etc-glusterfs-glusterd.vol.log' here.  If anyone can provide some
> insight into the issue it would be greatly appreciated.  I'll also be
> happy to provide any further details as needed.
> 
> 
> [2016-01-29 05:03:47.039886] I [MSGID: 106144]
> [glusterd-pmap.c:274:pmap_registry_remove] 0-pmap: removing brick
> /export/data/brick02 on port 49155
> [2016-01-29 05:03:47.075521] W [socket.c:588:__socket_rwv] 0-management:
> readv on /var/run/gluster/53a233b05f5d4be45dc94391bc3ebfe5.socket failed
> (No data available)
> [2016-01-29 05:03:47.078282] I [MSGID: 106005]
> [glusterd-handler.c:4908:__glusterd_brick_rpc_notify] 0-management:
> Brick gluster-stor02:/export/data/brick02 has disconnected from glusterd.
> [2016-01-29 05:03:47.149161] W [glusterfsd.c:1236:cleanup_and_exit]
> (-->/lib64/libpthread.so.0() [0x3e47a079d1]
> -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xcd) [0x405e6d]
> -->/usr/sbin/glusterd(cleanup_and_exit+0x65) [0x4059d5] ) 0-: recei
> ved signum (15), shutting down
> [2016-01-29 05:03:54.067012] I [MSGID: 100030] [glusterfsd.c:2318:main]
> 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.7.6
> (args: /usr/sbin/glusterd --pid-file=/var/run/glusterd.pid)
> [2016-01-29 05:03:54.071901] I [MSGID: 106478] [glusterd.c:1350:init]
> 0-management: Maximum allowed open file descriptors set to 65536
> [2016-01-29 05:03:54.071935] I [MSGID: 106479] [glusterd.c:1399:init]
> 0-management: Using /var/lib/glusterd as working directory
> [2016-01-29 05:03:54.075655] E [rpc-transport.c:292:rpc_transport_load]
> 0-rpc-transport: /usr/lib64/glusterfs/3.7.6/rpc-transport/rdma.so:
> cannot open shared object file: No such file or directory
> [2016-01-29 05:03:54.075672] W [rpc-transport.c:296:rpc_transport_load]
> 0-rpc-transport: volume 'rdma.management': transport-type
'rdma' is not
> valid or not found on this machine
> [2016-01-29 05:03:54.075680] W [rpcsvc.c:1597:rpcsvc_transport_create]
> 0-rpc-service: cannot create listener, initing the transport failed
> [2016-01-29 05:03:54.075687] E [MSGID: 106243] [glusterd.c:1623:init]
> 0-management: creation of 1 listeners failed, continuing with succeeded
> transport
> [2016-01-29 05:03:55.869717] I [MSGID: 106513]
> [glusterd-store.c:2047:glusterd_restore_op_version] 0-glusterd:
> retrieved op-version: 30702
> [2016-01-29 05:03:55.995747] I [MSGID: 106498]
> [glusterd-handler.c:3579:glusterd_friend_add_from_peerinfo]
> 0-management: connect returned 0
> [2016-01-29 05:03:55.995866] I [rpc-clnt.c:984:rpc_clnt_connection_init]
> 0-management: setting frame-timeout to 600
> [2016-01-29 05:03:56.000937] I [MSGID: 106544]
> [glusterd.c:159:glusterd_uuid_init] 0-management: retrieved UUID:
> 9b103ea8-d248-44fc-8f80-3e87f7c4971c
> Final graph:
>
+------------------------------------------------------------------------------+
>   1: volume management
>   2:     type mgmt/glusterd
>   3:     option rpc-auth.auth-glusterfs on
>   4:     option rpc-auth.auth-unix on
>   5:     option rpc-auth.auth-null on
>   6:     option rpc-auth-allow-insecure on
>   7:     option transport.socket.listen-backlog 128
>   8:     option ping-timeout 30
>   9:     option transport.socket.read-fail-log off
>  10:     option transport.socket.keepalive-interval 2
>  11:     option transport.socket.keepalive-time 10
>  12:     option transport-type rdma
>  13:     option working-directory /var/lib/glusterd
>  14: end-volume
>  15:  
>
+------------------------------------------------------------------------------+
> [2016-01-29 05:03:56.002570] I [MSGID: 101190]
> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 2
> [2016-01-29 05:03:56.003098] I [MSGID: 101190]
> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2016-01-29 05:03:56.003158] I [MSGID: 101190]
> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 2
> [2016-01-29 05:03:56.855628] I [MSGID: 106493]
> [glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd: Received
> ACC from uuid: 388a8bb4-c530-44ff-838b-8f7b9e4c95db, host: 10.1.1.10,
> port: 0
> [2016-01-29 05:03:56.856787] I [rpc-clnt.c:984:rpc_clnt_connection_init]
> 0-nfs: setting frame-timeout to 600
> [2016-01-29 05:03:57.859093] I [MSGID: 106540]
> [glusterd-utils.c:4191:glusterd_nfs_pmap_deregister] 0-glusterd:
> De-registered MOUNTV3 successfully
> [2016-01-29 05:03:57.860228] I [MSGID: 106540]
> [glusterd-utils.c:4200:glusterd_nfs_pmap_deregister] 0-glusterd:
> De-registered MOUNTV1 successfully
> [2016-01-29 05:03:57.861329] I [MSGID: 106540]
> [glusterd-utils.c:4209:glusterd_nfs_pmap_deregister] 0-glusterd:
> De-registered NFSV3 successfully
> [2016-01-29 05:03:57.862421] I [MSGID: 106540]
> [glusterd-utils.c:4218:glusterd_nfs_pmap_deregister] 0-glusterd:
> De-registered NLM v4 successfully
> [2016-01-29 05:03:57.863510] I [MSGID: 106540]
> [glusterd-utils.c:4227:glusterd_nfs_pmap_deregister] 0-glusterd:
> De-registered NLM v1 successfully
> [2016-01-29 05:03:57.864600] I [MSGID: 106540]
> [glusterd-utils.c:4236:glusterd_nfs_pmap_deregister] 0-glusterd:
> De-registered ACL v3 successfully
> [2016-01-29 05:03:57.870948] W [socket.c:3009:socket_connect] 0-nfs:
> Ignore failed connection attempt on , (No such file or directory) 
> 
> 
> 
> 
> 
> 
> 
> [2016-01-29 05:03:47.075614] W [socket.c:588:__socket_rwv]
> 0-data02-client-1: readv on 10.1.1.10:49155
> <http://10.1.1.10:49155/> failed (No data available)
> [2016-01-29 05:03:47.076871] I [MSGID: 114018]
> [client.c:2042:client_rpc_notify] 0-data02-client-1: disconnected from
> data02-client-1. Client process will keep trying to connect to glusterd
> until brick's port is available
> [2016-01-29 05:03:47.170284] W [socket.c:588:__socket_rwv] 0-glusterfs:
> readv on 127.0.0.1:24007 <http://127.0.0.1:24007/> failed (No data
> available)
> [2016-01-29 05:03:47.639163] W [socket.c:588:__socket_rwv]
> 0-data02-client-0: readv on 10.1.1.11:49153
> <http://10.1.1.11:49153/> failed (No data available)
> [2016-01-29 05:03:47.639206] I [MSGID: 114018]
> [client.c:2042:client_rpc_notify] 0-data02-client-0: disconnected from
> data02-client-0. Client process will keep trying to connect to glusterd
> until brick's port is available
> [2016-01-29 05:03:47.640222] E [MSGID: 108006]
> [afr-common.c:3880:afr_notify] 0-data02-replicate-0: All subvolumes are
> down. Going offline until atleast one of them comes back up.
> [2016-01-29 05:03:57.872983] W [glusterfsd.c:1236:cleanup_and_exit]
> (-->/lib64/libpthread.so.0() [0x3e47a079d1]
> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e6d]
> -->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059d5] ) 0-: rec
> eived signum (15), shutting down
> [2016-01-29 05:03:58.881541] I [MSGID: 100030] [glusterfsd.c:2318:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.6
> (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd
> -p /var/lib/glusterd/
> glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S
> /var/run/gluster/8d72de580ccac07d2ecfc2491a9b1648.socket --xlator-option
> *replicate*.node-uuid=9b103ea8-d248-44fc-8f80-3e87f7c4971c)
> [2016-01-29 05:03:58.890833] I [MSGID: 101190]
> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2016-01-29 05:03:59.340030] I [graph.c:269:gf_add_cmdline_options]
> 0-data02-replicate-0: adding option 'node-uuid' for volume
> 'data02-replicate-0' with value
'9b103ea8-d248-44fc-8f80-3e87f7c4971c'
> [2016-01-29 05:03:59.342682] I [MSGID: 101190]
> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 2
> [2016-01-29 05:03:59.342742] I [MSGID: 101190]
> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 3
> [2016-01-29 05:03:59.342827] I [MSGID: 101190]
> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 4
> [2016-01-29 05:03:59.342892] I [MSGID: 101190]
> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 5
> [2016-01-29 05:03:59.342917] I [MSGID: 101190]
> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 6
> [2016-01-29 05:03:59.343563] I [MSGID: 101190]
> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 8
> [2016-01-29 05:03:59.343569] I [MSGID: 101190]
> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 7
> [2016-01-29 05:03:59.343657] I [MSGID: 101190]
> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 9
> [2016-01-29 05:03:59.343705] I [MSGID: 101190]
> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 11
> [2016-01-29 05:03:59.343710] I [MSGID: 101190]
> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 10
> [2016-01-29 05:03:59.344278] I [MSGID: 114020] [client.c:2118:notify]
> 0-data02-client-0: parent translators are ready, attempting connect on
> transport
> [2016-01-29 05:03:59.346553] I [MSGID: 114020] [client.c:2118:notify]
> 0-data02-client-1: parent translators are ready, attempting connect on
> transport
> Final graph:
>
+------------------------------------------------------------------------------+
>   1: volume data02-client-0
>   2:     type protocol/client
>   3:     option ping-timeout 42
>   4:     option remote-host gluster-stor01
>   5:     option remote-subvolume /export/data/brick02
>   6:     option transport-type socket
>   7:     option username 5cc4f5d1-bcc8-4e06-ac74-520b20e2b452
>   8:     option password 66b85782-5833-4f2d-ad0e-8de75247b094F
>   9:     option event-threads 11
>  10: end-volume
>  11:  
>  12: volume data02-client-1
>  13:     type protocol/client
>  14:     option ping-timeout 42
>  15:     option remote-host gluster-stor02
>  16:     option remote-subvolume /export/data/brick02
>  17:     option transport-type socket
>  18:     option username 5cc4f5d1-bcc8-4e06-ac74-520b20e2b452
>  19:     option password 66b85782-5833-4f2d-ad0e-8de75247b094
>  20:     option event-threads 11
>  21: end-volume
>  22:  
>  23: volume data02-replicate-0
>  24:     type cluster/replicate
>  25:     option node-uuid 9b103ea8-d248-44fc-8f80-3e87f7c4971c
>  26:     option background-self-heal-count 0
>  27:     option metadata-self-heal on
>  28:     option data-self-heal on
>  29:     option entry-self-heal on
>  30:     option self-heal-daemon enable
>  31:     option iam-self-heal-daemon yes
>  32:     subvolumes data02-client-0 data02-client-1
>  33: end-volume
>  34:  
>  35: volume glustershd
>  36:     type debug/io-stats
>  37:     subvolumes data02-replicate-0
>  38: end-volume
>  39:  
>
+------------------------------------------------------------------------------+
> [2016-01-29 05:03:59.348913] E [MSGID: 114058]
> [client-handshake.c:1524:client_query_portmap_cbk] 0-data02-client-1:
> failed to get the port number for remote subvolume. Please run 'gluster
> volume status' on server to see if brick process 
> is running.
> [2016-01-29 05:03:59.348960] I [MSGID: 114018]
> [client.c:2042:client_rpc_notify] 0-data02-client-1: disconnected from
> data02-client-1. Client process will keep trying to connect to glusterd
> until brick's port is available
> [2016-01-29 05:03:59.436909] E [MSGID: 114058]
> [client-handshake.c:1524:client_query_portmap_cbk] 0-data02-client-0:
> failed to get the port number for remote subvolume. Please run 'gluster
> volume status' on server to see if brick process 
> is running.
> [2016-01-29 05:03:59.436974] I [MSGID: 114018]
> [client.c:2042:client_rpc_notify] 0-data02-client-0: disconnected from
> data02-client-0. Client process will keep trying to connect to glusterd
> until brick's port is available
> [2016-01-29 05:03:59.436991] E [MSGID: 108006]
> [afr-common.c:3880:afr_notify] 0-data02-replicate-0: All subvolumes are
> down. Going offline until atleast one of them comes back up.
> [2016-01-29 05:04:02.886317] I [rpc-clnt.c:1847:rpc_clnt_reconfig]
> 0-data02-client-0: changing port to 49153 (from 0)
> [2016-01-29 05:04:02.888761] I [rpc-clnt.c:1847:rpc_clnt_reconfig]
> 0-data02-client-1: changing port to 49155 (from 0)
> [2016-01-29 05:04:02.891105] I [MSGID: 114057]
> [client-handshake.c:1437:select_server_supported_programs]
> 0-data02-client-0: Using Program GlusterFS 3.3, Num (1298437), Version
(330)
> [2016-01-29 05:04:02.891360] I [MSGID: 114046]
> [client-handshake.c:1213:client_setvolume_cbk] 0-data02-client-0:
> Connected to data02-client-0, attached to remote volume
> '/export/data/brick02'.
> [2016-01-29 05:04:02.891373] I [MSGID: 114047]
> [client-handshake.c:1224:client_setvolume_cbk] 0-data02-client-0: Server
> and Client lk-version numbers are not same, reopening the fds
> [2016-01-29 05:04:02.891403] I [MSGID: 108005]
> [afr-common.c:3841:afr_notify] 0-data02-replicate-0: Subvolume
> 'data02-client-0' came back up; going online.
> [2016-01-29 05:04:02.891518] I [MSGID: 114035]
> [client-handshake.c:193:client_set_lk_version_cbk] 0-data02-client-0:
> Server lk version = 1
> [2016-01-29 05:04:02.893074] I [MSGID: 114057]
> [client-handshake.c:1437:select_server_supported_programs]
> 0-data02-client-1: Using Program GlusterFS 3.3, Num (1298437), Version
(330)
> [2016-01-29 05:04:02.893251] I [MSGID: 114046]
> [client-handshake.c:1213:client_setvolume_cbk] 0-data02-client-1:
> Connected to data02-client-1, attached to remote volume
> '/export/data/brick02'.
> [2016-01-29 05:04:02.893276] I [MSGID: 114047]
> [client-handshake.c:1224:client_setvolume_cbk] 0-data02-client-1: Server
> and Client lk-version numbers are not same, reopening the fds
> [2016-01-29 05:04:02.893401] I [MSGID: 114035]
> [client-handshake.c:193:client_set_lk_version_cbk] 0-data02-client-1:
> Server lk version = 1
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>

Logan Barfield

2016-Feb-01 17:03 UTC

head link

[Gluster-users] [Gluster-devel] Losing connection to bricks, Gluster processes restarting

Volume Name: data02
Type: Replicate
Volume ID: 1c8928b1-f49e-4950-be06-0f8ce5adf870
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gluster-stor01:/export/data/brick02   <-- 10.1.1.10
Brick2: gluster-stor02:/export/data/brick02   <-- 10.1.1.11
Options Reconfigured:
server.event-threads: 5
client.event-threads: 11
geo-replication.indexing: on
geo-replication.ignore-pid-check: on
changelog.changelog: on
server.statedump-path: /tmp
server.outstanding-rpc-limit: 128
performance.io-thread-count: 64
performance.nfs.read-ahead: on
performance.nfs.io-cache: on
performance.nfs.quick-read: on
performance.cache-max-file-size: 1MB
performance.client-io-threads: on
cluster.lookup-optimize: on
performance.cache-size: 1073741824
performance.write-behind-window-size: 4MB
performance.nfs.write-behind-window-size: 4MB
performance.read-ahead: off
performance.nfs.stat-prefetch: on


Status of volume: data02
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster-stor01:/export/data/brick02      49153     0          Y
17411
Brick gluster-stor02:/export/data/brick02      49155     0          Y
4717
NFS Server on localhost                     2049      0          Y
17395
Self-heal Daemon on localhost               N/A       N/A        Y
17405
NFS Server on gluster-stor02                   2049      0          Y
4701
Self-heal Daemon on gluster-stor02             N/A       N/A        Y
4712

Task Status of Volume data02
------------------------------------------------------------------------------
There are no active volume tasks



Note that this problem was occurring with the same frequency before we
added all of the volume options above.  We were running defaults up until
last week, and changing them had no impact on this particular problem.




Thank You,

Logan Barfield
Tranquil Hosting

On Fri, Jan 29, 2016 at 9:28 PM, Atin Mukherjee <amukherj at redhat.com>
wrote:
> Could you paste output of gluster volume info?
>
> ~Atin
>
> On 01/29/2016 11:59 PM, Logan Barfield wrote:
> > We're running a fairly large 2-replica volume across two servers. 
The
> > volume is approximately 20TB of small 1K-4MB files.  The volume is
> > exported via NFS, and mounted remotely by two clients.
> >
> > For the past few weeks the Gluster brick processes have been randomly
> > restarting.  Luckily they've been doing so at non-peak times, so
we
> > didn't notice until our monitoring checks happened to pick up on
zombied
> > 'glusterfs' process.
> >
> > From the logs it looks like something is blocking communication to the
> > brick processes, and Gluster automatically restarts everything to
> > compensate.  I've so far not been able to figure out the
underlying
> cause.
> >
> > I've included log snippets from 'glustershd.log' and
> > 'etc-glusterfs-glusterd.vol.log' here.  If anyone can provide
some
> > insight into the issue it would be greatly appreciated.  I'll also
be
> > happy to provide any further details as needed.
> >
> >
> > [2016-01-29 05:03:47.039886] I [MSGID: 106144]
> > [glusterd-pmap.c:274:pmap_registry_remove] 0-pmap: removing brick
> > /export/data/brick02 on port 49155
> > [2016-01-29 05:03:47.075521] W [socket.c:588:__socket_rwv]
0-management:
> > readv on /var/run/gluster/53a233b05f5d4be45dc94391bc3ebfe5.socket
failed
> > (No data available)
> > [2016-01-29 05:03:47.078282] I [MSGID: 106005]
> > [glusterd-handler.c:4908:__glusterd_brick_rpc_notify] 0-management:
> > Brick gluster-stor02:/export/data/brick02 has disconnected from
glusterd.
> > [2016-01-29 05:03:47.149161] W [glusterfsd.c:1236:cleanup_and_exit]
> > (-->/lib64/libpthread.so.0() [0x3e47a079d1]
> > -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xcd) [0x405e6d]
> > -->/usr/sbin/glusterd(cleanup_and_exit+0x65) [0x4059d5] ) 0-: recei
> > ved signum (15), shutting down
> > [2016-01-29 05:03:54.067012] I [MSGID: 100030]
[glusterfsd.c:2318:main]
> > 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.7.6
> > (args: /usr/sbin/glusterd --pid-file=/var/run/glusterd.pid)
> > [2016-01-29 05:03:54.071901] I [MSGID: 106478] [glusterd.c:1350:init]
> > 0-management: Maximum allowed open file descriptors set to 65536
> > [2016-01-29 05:03:54.071935] I [MSGID: 106479] [glusterd.c:1399:init]
> > 0-management: Using /var/lib/glusterd as working directory
> > [2016-01-29 05:03:54.075655] E
[rpc-transport.c:292:rpc_transport_load]
> > 0-rpc-transport: /usr/lib64/glusterfs/3.7.6/rpc-transport/rdma.so:
> > cannot open shared object file: No such file or directory
> > [2016-01-29 05:03:54.075672] W
[rpc-transport.c:296:rpc_transport_load]
> > 0-rpc-transport: volume 'rdma.management': transport-type
'rdma' is not
> > valid or not found on this machine
> > [2016-01-29 05:03:54.075680] W [rpcsvc.c:1597:rpcsvc_transport_create]
> > 0-rpc-service: cannot create listener, initing the transport failed
> > [2016-01-29 05:03:54.075687] E [MSGID: 106243] [glusterd.c:1623:init]
> > 0-management: creation of 1 listeners failed, continuing with
succeeded
> > transport
> > [2016-01-29 05:03:55.869717] I [MSGID: 106513]
> > [glusterd-store.c:2047:glusterd_restore_op_version] 0-glusterd:
> > retrieved op-version: 30702
> > [2016-01-29 05:03:55.995747] I [MSGID: 106498]
> > [glusterd-handler.c:3579:glusterd_friend_add_from_peerinfo]
> > 0-management: connect returned 0
> > [2016-01-29 05:03:55.995866] I
[rpc-clnt.c:984:rpc_clnt_connection_init]
> > 0-management: setting frame-timeout to 600
> > [2016-01-29 05:03:56.000937] I [MSGID: 106544]
> > [glusterd.c:159:glusterd_uuid_init] 0-management: retrieved UUID:
> > 9b103ea8-d248-44fc-8f80-3e87f7c4971c
> > Final graph:
> >
>
+------------------------------------------------------------------------------+
> >   1: volume management
> >   2:     type mgmt/glusterd
> >   3:     option rpc-auth.auth-glusterfs on
> >   4:     option rpc-auth.auth-unix on
> >   5:     option rpc-auth.auth-null on
> >   6:     option rpc-auth-allow-insecure on
> >   7:     option transport.socket.listen-backlog 128
> >   8:     option ping-timeout 30
> >   9:     option transport.socket.read-fail-log off
> >  10:     option transport.socket.keepalive-interval 2
> >  11:     option transport.socket.keepalive-time 10
> >  12:     option transport-type rdma
> >  13:     option working-directory /var/lib/glusterd
> >  14: end-volume
> >  15:
> >
>
+------------------------------------------------------------------------------+
> > [2016-01-29 05:03:56.002570] I [MSGID: 101190]
> > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
thread
> > with index 2
> > [2016-01-29 05:03:56.003098] I [MSGID: 101190]
> > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
thread
> > with index 1
> > [2016-01-29 05:03:56.003158] I [MSGID: 101190]
> > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
thread
> > with index 2
> > [2016-01-29 05:03:56.855628] I [MSGID: 106493]
> > [glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd:
Received
> > ACC from uuid: 388a8bb4-c530-44ff-838b-8f7b9e4c95db, host: 10.1.1.10,
> > port: 0
> > [2016-01-29 05:03:56.856787] I
[rpc-clnt.c:984:rpc_clnt_connection_init]
> > 0-nfs: setting frame-timeout to 600
> > [2016-01-29 05:03:57.859093] I [MSGID: 106540]
> > [glusterd-utils.c:4191:glusterd_nfs_pmap_deregister] 0-glusterd:
> > De-registered MOUNTV3 successfully
> > [2016-01-29 05:03:57.860228] I [MSGID: 106540]
> > [glusterd-utils.c:4200:glusterd_nfs_pmap_deregister] 0-glusterd:
> > De-registered MOUNTV1 successfully
> > [2016-01-29 05:03:57.861329] I [MSGID: 106540]
> > [glusterd-utils.c:4209:glusterd_nfs_pmap_deregister] 0-glusterd:
> > De-registered NFSV3 successfully
> > [2016-01-29 05:03:57.862421] I [MSGID: 106540]
> > [glusterd-utils.c:4218:glusterd_nfs_pmap_deregister] 0-glusterd:
> > De-registered NLM v4 successfully
> > [2016-01-29 05:03:57.863510] I [MSGID: 106540]
> > [glusterd-utils.c:4227:glusterd_nfs_pmap_deregister] 0-glusterd:
> > De-registered NLM v1 successfully
> > [2016-01-29 05:03:57.864600] I [MSGID: 106540]
> > [glusterd-utils.c:4236:glusterd_nfs_pmap_deregister] 0-glusterd:
> > De-registered ACL v3 successfully
> > [2016-01-29 05:03:57.870948] W [socket.c:3009:socket_connect] 0-nfs:
> > Ignore failed connection attempt on , (No such file or directory)
> >
> >
> >
> >
> >
> >
> >
> > [2016-01-29 05:03:47.075614] W [socket.c:588:__socket_rwv]
> > 0-data02-client-1: readv on 10.1.1.10:49155
> > <http://10.1.1.10:49155/> failed (No data available)
> > [2016-01-29 05:03:47.076871] I [MSGID: 114018]
> > [client.c:2042:client_rpc_notify] 0-data02-client-1: disconnected from
> > data02-client-1. Client process will keep trying to connect to
glusterd
> > until brick's port is available
> > [2016-01-29 05:03:47.170284] W [socket.c:588:__socket_rwv]
0-glusterfs:
> > readv on 127.0.0.1:24007 <http://127.0.0.1:24007/> failed (No
data
> > available)
> > [2016-01-29 05:03:47.639163] W [socket.c:588:__socket_rwv]
> > 0-data02-client-0: readv on 10.1.1.11:49153
> > <http://10.1.1.11:49153/> failed (No data available)
> > [2016-01-29 05:03:47.639206] I [MSGID: 114018]
> > [client.c:2042:client_rpc_notify] 0-data02-client-0: disconnected from
> > data02-client-0. Client process will keep trying to connect to
glusterd
> > until brick's port is available
> > [2016-01-29 05:03:47.640222] E [MSGID: 108006]
> > [afr-common.c:3880:afr_notify] 0-data02-replicate-0: All subvolumes
are
> > down. Going offline until atleast one of them comes back up.
> > [2016-01-29 05:03:57.872983] W [glusterfsd.c:1236:cleanup_and_exit]
> > (-->/lib64/libpthread.so.0() [0x3e47a079d1]
> > -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e6d]
> > -->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059d5] ) 0-: rec
> > eived signum (15), shutting down
> > [2016-01-29 05:03:58.881541] I [MSGID: 100030]
[glusterfsd.c:2318:main]
> > 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version
3.7.6
> > (args: /usr/sbin/glusterfs -s localhost --volfile-id
gluster/glustershd
> > -p /var/lib/glusterd/
> > glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S
> > /var/run/gluster/8d72de580ccac07d2ecfc2491a9b1648.socket
--xlator-option
> > *replicate*.node-uuid=9b103ea8-d248-44fc-8f80-3e87f7c4971c)
> > [2016-01-29 05:03:58.890833] I [MSGID: 101190]
> > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
thread
> > with index 1
> > [2016-01-29 05:03:59.340030] I [graph.c:269:gf_add_cmdline_options]
> > 0-data02-replicate-0: adding option 'node-uuid' for volume
> > 'data02-replicate-0' with value
'9b103ea8-d248-44fc-8f80-3e87f7c4971c'
> > [2016-01-29 05:03:59.342682] I [MSGID: 101190]
> > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
thread
> > with index 2
> > [2016-01-29 05:03:59.342742] I [MSGID: 101190]
> > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
thread
> > with index 3
> > [2016-01-29 05:03:59.342827] I [MSGID: 101190]
> > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
thread
> > with index 4
> > [2016-01-29 05:03:59.342892] I [MSGID: 101190]
> > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
thread
> > with index 5
> > [2016-01-29 05:03:59.342917] I [MSGID: 101190]
> > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
thread
> > with index 6
> > [2016-01-29 05:03:59.343563] I [MSGID: 101190]
> > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
thread
> > with index 8
> > [2016-01-29 05:03:59.343569] I [MSGID: 101190]
> > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
thread
> > with index 7
> > [2016-01-29 05:03:59.343657] I [MSGID: 101190]
> > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
thread
> > with index 9
> > [2016-01-29 05:03:59.343705] I [MSGID: 101190]
> > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
thread
> > with index 11
> > [2016-01-29 05:03:59.343710] I [MSGID: 101190]
> > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
thread
> > with index 10
> > [2016-01-29 05:03:59.344278] I [MSGID: 114020] [client.c:2118:notify]
> > 0-data02-client-0: parent translators are ready, attempting connect on
> > transport
> > [2016-01-29 05:03:59.346553] I [MSGID: 114020] [client.c:2118:notify]
> > 0-data02-client-1: parent translators are ready, attempting connect on
> > transport
> > Final graph:
> >
>
+------------------------------------------------------------------------------+
> >   1: volume data02-client-0
> >   2:     type protocol/client
> >   3:     option ping-timeout 42
> >   4:     option remote-host gluster-stor01
> >   5:     option remote-subvolume /export/data/brick02
> >   6:     option transport-type socket
> >   7:     option username 5cc4f5d1-bcc8-4e06-ac74-520b20e2b452
> >   8:     option password 66b85782-5833-4f2d-ad0e-8de75247b094F
> >   9:     option event-threads 11
> >  10: end-volume
> >  11:
> >  12: volume data02-client-1
> >  13:     type protocol/client
> >  14:     option ping-timeout 42
> >  15:     option remote-host gluster-stor02
> >  16:     option remote-subvolume /export/data/brick02
> >  17:     option transport-type socket
> >  18:     option username 5cc4f5d1-bcc8-4e06-ac74-520b20e2b452
> >  19:     option password 66b85782-5833-4f2d-ad0e-8de75247b094
> >  20:     option event-threads 11
> >  21: end-volume
> >  22:
> >  23: volume data02-replicate-0
> >  24:     type cluster/replicate
> >  25:     option node-uuid 9b103ea8-d248-44fc-8f80-3e87f7c4971c
> >  26:     option background-self-heal-count 0
> >  27:     option metadata-self-heal on
> >  28:     option data-self-heal on
> >  29:     option entry-self-heal on
> >  30:     option self-heal-daemon enable
> >  31:     option iam-self-heal-daemon yes
> >  32:     subvolumes data02-client-0 data02-client-1
> >  33: end-volume
> >  34:
> >  35: volume glustershd
> >  36:     type debug/io-stats
> >  37:     subvolumes data02-replicate-0
> >  38: end-volume
> >  39:
> >
>
+------------------------------------------------------------------------------+
> > [2016-01-29 05:03:59.348913] E [MSGID: 114058]
> > [client-handshake.c:1524:client_query_portmap_cbk] 0-data02-client-1:
> > failed to get the port number for remote subvolume. Please run
'gluster
> > volume status' on server to see if brick process
> > is running.
> > [2016-01-29 05:03:59.348960] I [MSGID: 114018]
> > [client.c:2042:client_rpc_notify] 0-data02-client-1: disconnected from
> > data02-client-1. Client process will keep trying to connect to
glusterd
> > until brick's port is available
> > [2016-01-29 05:03:59.436909] E [MSGID: 114058]
> > [client-handshake.c:1524:client_query_portmap_cbk] 0-data02-client-0:
> > failed to get the port number for remote subvolume. Please run
'gluster
> > volume status' on server to see if brick process
> > is running.
> > [2016-01-29 05:03:59.436974] I [MSGID: 114018]
> > [client.c:2042:client_rpc_notify] 0-data02-client-0: disconnected from
> > data02-client-0. Client process will keep trying to connect to
glusterd
> > until brick's port is available
> > [2016-01-29 05:03:59.436991] E [MSGID: 108006]
> > [afr-common.c:3880:afr_notify] 0-data02-replicate-0: All subvolumes
are
> > down. Going offline until atleast one of them comes back up.
> > [2016-01-29 05:04:02.886317] I [rpc-clnt.c:1847:rpc_clnt_reconfig]
> > 0-data02-client-0: changing port to 49153 (from 0)
> > [2016-01-29 05:04:02.888761] I [rpc-clnt.c:1847:rpc_clnt_reconfig]
> > 0-data02-client-1: changing port to 49155 (from 0)
> > [2016-01-29 05:04:02.891105] I [MSGID: 114057]
> > [client-handshake.c:1437:select_server_supported_programs]
> > 0-data02-client-0: Using Program GlusterFS 3.3, Num (1298437), Version
> (330)
> > [2016-01-29 05:04:02.891360] I [MSGID: 114046]
> > [client-handshake.c:1213:client_setvolume_cbk] 0-data02-client-0:
> > Connected to data02-client-0, attached to remote volume
> > '/export/data/brick02'.
> > [2016-01-29 05:04:02.891373] I [MSGID: 114047]
> > [client-handshake.c:1224:client_setvolume_cbk] 0-data02-client-0:
Server
> > and Client lk-version numbers are not same, reopening the fds
> > [2016-01-29 05:04:02.891403] I [MSGID: 108005]
> > [afr-common.c:3841:afr_notify] 0-data02-replicate-0: Subvolume
> > 'data02-client-0' came back up; going online.
> > [2016-01-29 05:04:02.891518] I [MSGID: 114035]
> > [client-handshake.c:193:client_set_lk_version_cbk] 0-data02-client-0:
> > Server lk version = 1
> > [2016-01-29 05:04:02.893074] I [MSGID: 114057]
> > [client-handshake.c:1437:select_server_supported_programs]
> > 0-data02-client-1: Using Program GlusterFS 3.3, Num (1298437), Version
> (330)
> > [2016-01-29 05:04:02.893251] I [MSGID: 114046]
> > [client-handshake.c:1213:client_setvolume_cbk] 0-data02-client-1:
> > Connected to data02-client-1, attached to remote volume
> > '/export/data/brick02'.
> > [2016-01-29 05:04:02.893276] I [MSGID: 114047]
> > [client-handshake.c:1224:client_setvolume_cbk] 0-data02-client-1:
Server
> > and Client lk-version numbers are not same, reopening the fds
> > [2016-01-29 05:04:02.893401] I [MSGID: 114035]
> > [client-handshake.c:193:client_set_lk_version_cbk] 0-data02-client-1:
> > Server lk version = 1
> >
> >
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160201/ea1af0f5/attachment.html>

Gluster users - Jan 2016 - Losing connection to bricks, Gluster processes restarting

[Gluster-users] Losing connection to bricks, Gluster processes restarting

[Gluster-users] [Gluster-devel] Losing connection to bricks, Gluster processes restarting

[Gluster-users] [Gluster-devel] Losing connection to bricks, Gluster processes restarting