Logan Barfield
2016-Jan-29 18:29 UTC
[Gluster-users] Losing connection to bricks, Gluster processes restarting
We're running a fairly large 2-replica volume across two servers. The volume is approximately 20TB of small 1K-4MB files. The volume is exported via NFS, and mounted remotely by two clients. For the past few weeks the Gluster brick processes have been randomly restarting. Luckily they've been doing so at non-peak times, so we didn't notice until our monitoring checks happened to pick up on zombied 'glusterfs' process.>From the logs it looks like something is blocking communication to thebrick processes, and Gluster automatically restarts everything to compensate. I've so far not been able to figure out the underlying cause. I've included log snippets from 'glustershd.log' and 'etc-glusterfs-glusterd.vol.log' here. If anyone can provide some insight into the issue it would be greatly appreciated. I'll also be happy to provide any further details as needed. [2016-01-29 05:03:47.039886] I [MSGID: 106144] [glusterd-pmap.c:274:pmap_registry_remove] 0-pmap: removing brick /export/data/brick02 on port 49155 [2016-01-29 05:03:47.075521] W [socket.c:588:__socket_rwv] 0-management: readv on /var/run/gluster/53a233b05f5d4be45dc94391bc3ebfe5.socket failed (No data available) [2016-01-29 05:03:47.078282] I [MSGID: 106005] [glusterd-handler.c:4908:__glusterd_brick_rpc_notify] 0-management: Brick gluster-stor02:/export/data/brick02 has disconnected from glusterd. [2016-01-29 05:03:47.149161] W [glusterfsd.c:1236:cleanup_and_exit] (-->/lib64/libpthread.so.0() [0x3e47a079d1] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xcd) [0x405e6d] -->/usr/sbin/glusterd(cleanup_and_exit+0x65) [0x4059d5] ) 0-: recei ved signum (15), shutting down [2016-01-29 05:03:54.067012] I [MSGID: 100030] [glusterfsd.c:2318:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.7.6 (args: /usr/sbin/glusterd --pid-file=/var/run/glusterd.pid) [2016-01-29 05:03:54.071901] I [MSGID: 106478] [glusterd.c:1350:init] 0-management: Maximum allowed open file descriptors set to 65536 [2016-01-29 05:03:54.071935] I [MSGID: 106479] [glusterd.c:1399:init] 0-management: Using /var/lib/glusterd as working directory [2016-01-29 05:03:54.075655] E [rpc-transport.c:292:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.7.6/rpc-transport/rdma.so: cannot open shared object file: No such file or directory [2016-01-29 05:03:54.075672] W [rpc-transport.c:296:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine [2016-01-29 05:03:54.075680] W [rpcsvc.c:1597:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed [2016-01-29 05:03:54.075687] E [MSGID: 106243] [glusterd.c:1623:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport [2016-01-29 05:03:55.869717] I [MSGID: 106513] [glusterd-store.c:2047:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 30702 [2016-01-29 05:03:55.995747] I [MSGID: 106498] [glusterd-handler.c:3579:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2016-01-29 05:03:55.995866] I [rpc-clnt.c:984:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2016-01-29 05:03:56.000937] I [MSGID: 106544] [glusterd.c:159:glusterd_uuid_init] 0-management: retrieved UUID: 9b103ea8-d248-44fc-8f80-3e87f7c4971c Final graph: +------------------------------------------------------------------------------+ 1: volume management 2: type mgmt/glusterd 3: option rpc-auth.auth-glusterfs on 4: option rpc-auth.auth-unix on 5: option rpc-auth.auth-null on 6: option rpc-auth-allow-insecure on 7: option transport.socket.listen-backlog 128 8: option ping-timeout 30 9: option transport.socket.read-fail-log off 10: option transport.socket.keepalive-interval 2 11: option transport.socket.keepalive-time 10 12: option transport-type rdma 13: option working-directory /var/lib/glusterd 14: end-volume 15: +------------------------------------------------------------------------------+ [2016-01-29 05:03:56.002570] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 [2016-01-29 05:03:56.003098] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2016-01-29 05:03:56.003158] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 [2016-01-29 05:03:56.855628] I [MSGID: 106493] [glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: 388a8bb4-c530-44ff-838b-8f7b9e4c95db, host: 10.1.1.10, port: 0 [2016-01-29 05:03:56.856787] I [rpc-clnt.c:984:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600 [2016-01-29 05:03:57.859093] I [MSGID: 106540] [glusterd-utils.c:4191:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered MOUNTV3 successfully [2016-01-29 05:03:57.860228] I [MSGID: 106540] [glusterd-utils.c:4200:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered MOUNTV1 successfully [2016-01-29 05:03:57.861329] I [MSGID: 106540] [glusterd-utils.c:4209:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered NFSV3 successfully [2016-01-29 05:03:57.862421] I [MSGID: 106540] [glusterd-utils.c:4218:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered NLM v4 successfully [2016-01-29 05:03:57.863510] I [MSGID: 106540] [glusterd-utils.c:4227:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered NLM v1 successfully [2016-01-29 05:03:57.864600] I [MSGID: 106540] [glusterd-utils.c:4236:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered ACL v3 successfully [2016-01-29 05:03:57.870948] W [socket.c:3009:socket_connect] 0-nfs: Ignore failed connection attempt on , (No such file or directory) [2016-01-29 05:03:47.075614] W [socket.c:588:__socket_rwv] 0-data02-client-1: readv on 10.1.1.10:49155 failed (No data available) [2016-01-29 05:03:47.076871] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-data02-client-1: disconnected from data02-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2016-01-29 05:03:47.170284] W [socket.c:588:__socket_rwv] 0-glusterfs: readv on 127.0.0.1:24007 failed (No data available) [2016-01-29 05:03:47.639163] W [socket.c:588:__socket_rwv] 0-data02-client-0: readv on 10.1.1.11:49153 failed (No data available) [2016-01-29 05:03:47.639206] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-data02-client-0: disconnected from data02-client-0. Client process will keep trying to connect to glusterd until brick's port is available [2016-01-29 05:03:47.640222] E [MSGID: 108006] [afr-common.c:3880:afr_notify] 0-data02-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2016-01-29 05:03:57.872983] W [glusterfsd.c:1236:cleanup_and_exit] (-->/lib64/libpthread.so.0() [0x3e47a079d1] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e6d] -->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059d5] ) 0-: rec eived signum (15), shutting down [2016-01-29 05:03:58.881541] I [MSGID: 100030] [glusterfsd.c:2318:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.6 (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/ glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/8d72de580ccac07d2ecfc2491a9b1648.socket --xlator-option *replicate*.node-uuid=9b103ea8-d248-44fc-8f80-3e87f7c4971c) [2016-01-29 05:03:58.890833] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2016-01-29 05:03:59.340030] I [graph.c:269:gf_add_cmdline_options] 0-data02-replicate-0: adding option 'node-uuid' for volume 'data02-replicate-0' with value '9b103ea8-d248-44fc-8f80-3e87f7c4971c' [2016-01-29 05:03:59.342682] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 [2016-01-29 05:03:59.342742] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 3 [2016-01-29 05:03:59.342827] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 4 [2016-01-29 05:03:59.342892] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 5 [2016-01-29 05:03:59.342917] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 6 [2016-01-29 05:03:59.343563] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 8 [2016-01-29 05:03:59.343569] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 7 [2016-01-29 05:03:59.343657] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 9 [2016-01-29 05:03:59.343705] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 11 [2016-01-29 05:03:59.343710] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 10 [2016-01-29 05:03:59.344278] I [MSGID: 114020] [client.c:2118:notify] 0-data02-client-0: parent translators are ready, attempting connect on transport [2016-01-29 05:03:59.346553] I [MSGID: 114020] [client.c:2118:notify] 0-data02-client-1: parent translators are ready, attempting connect on transport Final graph: +------------------------------------------------------------------------------+ 1: volume data02-client-0 2: type protocol/client 3: option ping-timeout 42 4: option remote-host gluster-stor01 5: option remote-subvolume /export/data/brick02 6: option transport-type socket 7: option username 5cc4f5d1-bcc8-4e06-ac74-520b20e2b452 8: option password 66b85782-5833-4f2d-ad0e-8de75247b094F 9: option event-threads 11 10: end-volume 11: 12: volume data02-client-1 13: type protocol/client 14: option ping-timeout 42 15: option remote-host gluster-stor02 16: option remote-subvolume /export/data/brick02 17: option transport-type socket 18: option username 5cc4f5d1-bcc8-4e06-ac74-520b20e2b452 19: option password 66b85782-5833-4f2d-ad0e-8de75247b094 20: option event-threads 11 21: end-volume 22: 23: volume data02-replicate-0 24: type cluster/replicate 25: option node-uuid 9b103ea8-d248-44fc-8f80-3e87f7c4971c 26: option background-self-heal-count 0 27: option metadata-self-heal on 28: option data-self-heal on 29: option entry-self-heal on 30: option self-heal-daemon enable 31: option iam-self-heal-daemon yes 32: subvolumes data02-client-0 data02-client-1 33: end-volume 34: 35: volume glustershd 36: type debug/io-stats 37: subvolumes data02-replicate-0 38: end-volume 39: +------------------------------------------------------------------------------+ [2016-01-29 05:03:59.348913] E [MSGID: 114058] [client-handshake.c:1524:client_query_portmap_cbk] 0-data02-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2016-01-29 05:03:59.348960] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-data02-client-1: disconnected from data02-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2016-01-29 05:03:59.436909] E [MSGID: 114058] [client-handshake.c:1524:client_query_portmap_cbk] 0-data02-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2016-01-29 05:03:59.436974] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-data02-client-0: disconnected from data02-client-0. Client process will keep trying to connect to glusterd until brick's port is available [2016-01-29 05:03:59.436991] E [MSGID: 108006] [afr-common.c:3880:afr_notify] 0-data02-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2016-01-29 05:04:02.886317] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 0-data02-client-0: changing port to 49153 (from 0) [2016-01-29 05:04:02.888761] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 0-data02-client-1: changing port to 49155 (from 0) [2016-01-29 05:04:02.891105] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-data02-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-01-29 05:04:02.891360] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-data02-client-0: Connected to data02-client-0, attached to remote volume '/export/data/brick02'. [2016-01-29 05:04:02.891373] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-data02-client-0: Server and Client lk-version numbers are not same, reopening the fds [2016-01-29 05:04:02.891403] I [MSGID: 108005] [afr-common.c:3841:afr_notify] 0-data02-replicate-0: Subvolume 'data02-client-0' came back up; going online. [2016-01-29 05:04:02.891518] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-data02-client-0: Server lk version = 1 [2016-01-29 05:04:02.893074] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-data02-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-01-29 05:04:02.893251] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-data02-client-1: Connected to data02-client-1, attached to remote volume '/export/data/brick02'. [2016-01-29 05:04:02.893276] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-data02-client-1: Server and Client lk-version numbers are not same, reopening the fds [2016-01-29 05:04:02.893401] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-data02-client-1: Server lk version = 1 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160129/56123729/attachment.html>
Atin Mukherjee
2016-Jan-30 02:28 UTC
[Gluster-users] [Gluster-devel] Losing connection to bricks, Gluster processes restarting
Could you paste output of gluster volume info? ~Atin On 01/29/2016 11:59 PM, Logan Barfield wrote:> We're running a fairly large 2-replica volume across two servers. The > volume is approximately 20TB of small 1K-4MB files. The volume is > exported via NFS, and mounted remotely by two clients. > > For the past few weeks the Gluster brick processes have been randomly > restarting. Luckily they've been doing so at non-peak times, so we > didn't notice until our monitoring checks happened to pick up on zombied > 'glusterfs' process. > > From the logs it looks like something is blocking communication to the > brick processes, and Gluster automatically restarts everything to > compensate. I've so far not been able to figure out the underlying cause. > > I've included log snippets from 'glustershd.log' and > 'etc-glusterfs-glusterd.vol.log' here. If anyone can provide some > insight into the issue it would be greatly appreciated. I'll also be > happy to provide any further details as needed. > > > [2016-01-29 05:03:47.039886] I [MSGID: 106144] > [glusterd-pmap.c:274:pmap_registry_remove] 0-pmap: removing brick > /export/data/brick02 on port 49155 > [2016-01-29 05:03:47.075521] W [socket.c:588:__socket_rwv] 0-management: > readv on /var/run/gluster/53a233b05f5d4be45dc94391bc3ebfe5.socket failed > (No data available) > [2016-01-29 05:03:47.078282] I [MSGID: 106005] > [glusterd-handler.c:4908:__glusterd_brick_rpc_notify] 0-management: > Brick gluster-stor02:/export/data/brick02 has disconnected from glusterd. > [2016-01-29 05:03:47.149161] W [glusterfsd.c:1236:cleanup_and_exit] > (-->/lib64/libpthread.so.0() [0x3e47a079d1] > -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xcd) [0x405e6d] > -->/usr/sbin/glusterd(cleanup_and_exit+0x65) [0x4059d5] ) 0-: recei > ved signum (15), shutting down > [2016-01-29 05:03:54.067012] I [MSGID: 100030] [glusterfsd.c:2318:main] > 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.7.6 > (args: /usr/sbin/glusterd --pid-file=/var/run/glusterd.pid) > [2016-01-29 05:03:54.071901] I [MSGID: 106478] [glusterd.c:1350:init] > 0-management: Maximum allowed open file descriptors set to 65536 > [2016-01-29 05:03:54.071935] I [MSGID: 106479] [glusterd.c:1399:init] > 0-management: Using /var/lib/glusterd as working directory > [2016-01-29 05:03:54.075655] E [rpc-transport.c:292:rpc_transport_load] > 0-rpc-transport: /usr/lib64/glusterfs/3.7.6/rpc-transport/rdma.so: > cannot open shared object file: No such file or directory > [2016-01-29 05:03:54.075672] W [rpc-transport.c:296:rpc_transport_load] > 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not > valid or not found on this machine > [2016-01-29 05:03:54.075680] W [rpcsvc.c:1597:rpcsvc_transport_create] > 0-rpc-service: cannot create listener, initing the transport failed > [2016-01-29 05:03:54.075687] E [MSGID: 106243] [glusterd.c:1623:init] > 0-management: creation of 1 listeners failed, continuing with succeeded > transport > [2016-01-29 05:03:55.869717] I [MSGID: 106513] > [glusterd-store.c:2047:glusterd_restore_op_version] 0-glusterd: > retrieved op-version: 30702 > [2016-01-29 05:03:55.995747] I [MSGID: 106498] > [glusterd-handler.c:3579:glusterd_friend_add_from_peerinfo] > 0-management: connect returned 0 > [2016-01-29 05:03:55.995866] I [rpc-clnt.c:984:rpc_clnt_connection_init] > 0-management: setting frame-timeout to 600 > [2016-01-29 05:03:56.000937] I [MSGID: 106544] > [glusterd.c:159:glusterd_uuid_init] 0-management: retrieved UUID: > 9b103ea8-d248-44fc-8f80-3e87f7c4971c > Final graph: > +------------------------------------------------------------------------------+ > 1: volume management > 2: type mgmt/glusterd > 3: option rpc-auth.auth-glusterfs on > 4: option rpc-auth.auth-unix on > 5: option rpc-auth.auth-null on > 6: option rpc-auth-allow-insecure on > 7: option transport.socket.listen-backlog 128 > 8: option ping-timeout 30 > 9: option transport.socket.read-fail-log off > 10: option transport.socket.keepalive-interval 2 > 11: option transport.socket.keepalive-time 10 > 12: option transport-type rdma > 13: option working-directory /var/lib/glusterd > 14: end-volume > 15: > +------------------------------------------------------------------------------+ > [2016-01-29 05:03:56.002570] I [MSGID: 101190] > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 2 > [2016-01-29 05:03:56.003098] I [MSGID: 101190] > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 1 > [2016-01-29 05:03:56.003158] I [MSGID: 101190] > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 2 > [2016-01-29 05:03:56.855628] I [MSGID: 106493] > [glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd: Received > ACC from uuid: 388a8bb4-c530-44ff-838b-8f7b9e4c95db, host: 10.1.1.10, > port: 0 > [2016-01-29 05:03:56.856787] I [rpc-clnt.c:984:rpc_clnt_connection_init] > 0-nfs: setting frame-timeout to 600 > [2016-01-29 05:03:57.859093] I [MSGID: 106540] > [glusterd-utils.c:4191:glusterd_nfs_pmap_deregister] 0-glusterd: > De-registered MOUNTV3 successfully > [2016-01-29 05:03:57.860228] I [MSGID: 106540] > [glusterd-utils.c:4200:glusterd_nfs_pmap_deregister] 0-glusterd: > De-registered MOUNTV1 successfully > [2016-01-29 05:03:57.861329] I [MSGID: 106540] > [glusterd-utils.c:4209:glusterd_nfs_pmap_deregister] 0-glusterd: > De-registered NFSV3 successfully > [2016-01-29 05:03:57.862421] I [MSGID: 106540] > [glusterd-utils.c:4218:glusterd_nfs_pmap_deregister] 0-glusterd: > De-registered NLM v4 successfully > [2016-01-29 05:03:57.863510] I [MSGID: 106540] > [glusterd-utils.c:4227:glusterd_nfs_pmap_deregister] 0-glusterd: > De-registered NLM v1 successfully > [2016-01-29 05:03:57.864600] I [MSGID: 106540] > [glusterd-utils.c:4236:glusterd_nfs_pmap_deregister] 0-glusterd: > De-registered ACL v3 successfully > [2016-01-29 05:03:57.870948] W [socket.c:3009:socket_connect] 0-nfs: > Ignore failed connection attempt on , (No such file or directory) > > > > > > > > [2016-01-29 05:03:47.075614] W [socket.c:588:__socket_rwv] > 0-data02-client-1: readv on 10.1.1.10:49155 > <http://10.1.1.10:49155/> failed (No data available) > [2016-01-29 05:03:47.076871] I [MSGID: 114018] > [client.c:2042:client_rpc_notify] 0-data02-client-1: disconnected from > data02-client-1. Client process will keep trying to connect to glusterd > until brick's port is available > [2016-01-29 05:03:47.170284] W [socket.c:588:__socket_rwv] 0-glusterfs: > readv on 127.0.0.1:24007 <http://127.0.0.1:24007/> failed (No data > available) > [2016-01-29 05:03:47.639163] W [socket.c:588:__socket_rwv] > 0-data02-client-0: readv on 10.1.1.11:49153 > <http://10.1.1.11:49153/> failed (No data available) > [2016-01-29 05:03:47.639206] I [MSGID: 114018] > [client.c:2042:client_rpc_notify] 0-data02-client-0: disconnected from > data02-client-0. Client process will keep trying to connect to glusterd > until brick's port is available > [2016-01-29 05:03:47.640222] E [MSGID: 108006] > [afr-common.c:3880:afr_notify] 0-data02-replicate-0: All subvolumes are > down. Going offline until atleast one of them comes back up. > [2016-01-29 05:03:57.872983] W [glusterfsd.c:1236:cleanup_and_exit] > (-->/lib64/libpthread.so.0() [0x3e47a079d1] > -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e6d] > -->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059d5] ) 0-: rec > eived signum (15), shutting down > [2016-01-29 05:03:58.881541] I [MSGID: 100030] [glusterfsd.c:2318:main] > 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.6 > (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd > -p /var/lib/glusterd/ > glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S > /var/run/gluster/8d72de580ccac07d2ecfc2491a9b1648.socket --xlator-option > *replicate*.node-uuid=9b103ea8-d248-44fc-8f80-3e87f7c4971c) > [2016-01-29 05:03:58.890833] I [MSGID: 101190] > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 1 > [2016-01-29 05:03:59.340030] I [graph.c:269:gf_add_cmdline_options] > 0-data02-replicate-0: adding option 'node-uuid' for volume > 'data02-replicate-0' with value '9b103ea8-d248-44fc-8f80-3e87f7c4971c' > [2016-01-29 05:03:59.342682] I [MSGID: 101190] > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 2 > [2016-01-29 05:03:59.342742] I [MSGID: 101190] > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 3 > [2016-01-29 05:03:59.342827] I [MSGID: 101190] > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 4 > [2016-01-29 05:03:59.342892] I [MSGID: 101190] > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 5 > [2016-01-29 05:03:59.342917] I [MSGID: 101190] > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 6 > [2016-01-29 05:03:59.343563] I [MSGID: 101190] > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 8 > [2016-01-29 05:03:59.343569] I [MSGID: 101190] > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 7 > [2016-01-29 05:03:59.343657] I [MSGID: 101190] > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 9 > [2016-01-29 05:03:59.343705] I [MSGID: 101190] > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 11 > [2016-01-29 05:03:59.343710] I [MSGID: 101190] > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 10 > [2016-01-29 05:03:59.344278] I [MSGID: 114020] [client.c:2118:notify] > 0-data02-client-0: parent translators are ready, attempting connect on > transport > [2016-01-29 05:03:59.346553] I [MSGID: 114020] [client.c:2118:notify] > 0-data02-client-1: parent translators are ready, attempting connect on > transport > Final graph: > +------------------------------------------------------------------------------+ > 1: volume data02-client-0 > 2: type protocol/client > 3: option ping-timeout 42 > 4: option remote-host gluster-stor01 > 5: option remote-subvolume /export/data/brick02 > 6: option transport-type socket > 7: option username 5cc4f5d1-bcc8-4e06-ac74-520b20e2b452 > 8: option password 66b85782-5833-4f2d-ad0e-8de75247b094F > 9: option event-threads 11 > 10: end-volume > 11: > 12: volume data02-client-1 > 13: type protocol/client > 14: option ping-timeout 42 > 15: option remote-host gluster-stor02 > 16: option remote-subvolume /export/data/brick02 > 17: option transport-type socket > 18: option username 5cc4f5d1-bcc8-4e06-ac74-520b20e2b452 > 19: option password 66b85782-5833-4f2d-ad0e-8de75247b094 > 20: option event-threads 11 > 21: end-volume > 22: > 23: volume data02-replicate-0 > 24: type cluster/replicate > 25: option node-uuid 9b103ea8-d248-44fc-8f80-3e87f7c4971c > 26: option background-self-heal-count 0 > 27: option metadata-self-heal on > 28: option data-self-heal on > 29: option entry-self-heal on > 30: option self-heal-daemon enable > 31: option iam-self-heal-daemon yes > 32: subvolumes data02-client-0 data02-client-1 > 33: end-volume > 34: > 35: volume glustershd > 36: type debug/io-stats > 37: subvolumes data02-replicate-0 > 38: end-volume > 39: > +------------------------------------------------------------------------------+ > [2016-01-29 05:03:59.348913] E [MSGID: 114058] > [client-handshake.c:1524:client_query_portmap_cbk] 0-data02-client-1: > failed to get the port number for remote subvolume. Please run 'gluster > volume status' on server to see if brick process > is running. > [2016-01-29 05:03:59.348960] I [MSGID: 114018] > [client.c:2042:client_rpc_notify] 0-data02-client-1: disconnected from > data02-client-1. Client process will keep trying to connect to glusterd > until brick's port is available > [2016-01-29 05:03:59.436909] E [MSGID: 114058] > [client-handshake.c:1524:client_query_portmap_cbk] 0-data02-client-0: > failed to get the port number for remote subvolume. Please run 'gluster > volume status' on server to see if brick process > is running. > [2016-01-29 05:03:59.436974] I [MSGID: 114018] > [client.c:2042:client_rpc_notify] 0-data02-client-0: disconnected from > data02-client-0. Client process will keep trying to connect to glusterd > until brick's port is available > [2016-01-29 05:03:59.436991] E [MSGID: 108006] > [afr-common.c:3880:afr_notify] 0-data02-replicate-0: All subvolumes are > down. Going offline until atleast one of them comes back up. > [2016-01-29 05:04:02.886317] I [rpc-clnt.c:1847:rpc_clnt_reconfig] > 0-data02-client-0: changing port to 49153 (from 0) > [2016-01-29 05:04:02.888761] I [rpc-clnt.c:1847:rpc_clnt_reconfig] > 0-data02-client-1: changing port to 49155 (from 0) > [2016-01-29 05:04:02.891105] I [MSGID: 114057] > [client-handshake.c:1437:select_server_supported_programs] > 0-data02-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) > [2016-01-29 05:04:02.891360] I [MSGID: 114046] > [client-handshake.c:1213:client_setvolume_cbk] 0-data02-client-0: > Connected to data02-client-0, attached to remote volume > '/export/data/brick02'. > [2016-01-29 05:04:02.891373] I [MSGID: 114047] > [client-handshake.c:1224:client_setvolume_cbk] 0-data02-client-0: Server > and Client lk-version numbers are not same, reopening the fds > [2016-01-29 05:04:02.891403] I [MSGID: 108005] > [afr-common.c:3841:afr_notify] 0-data02-replicate-0: Subvolume > 'data02-client-0' came back up; going online. > [2016-01-29 05:04:02.891518] I [MSGID: 114035] > [client-handshake.c:193:client_set_lk_version_cbk] 0-data02-client-0: > Server lk version = 1 > [2016-01-29 05:04:02.893074] I [MSGID: 114057] > [client-handshake.c:1437:select_server_supported_programs] > 0-data02-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) > [2016-01-29 05:04:02.893251] I [MSGID: 114046] > [client-handshake.c:1213:client_setvolume_cbk] 0-data02-client-1: > Connected to data02-client-1, attached to remote volume > '/export/data/brick02'. > [2016-01-29 05:04:02.893276] I [MSGID: 114047] > [client-handshake.c:1224:client_setvolume_cbk] 0-data02-client-1: Server > and Client lk-version numbers are not same, reopening the fds > [2016-01-29 05:04:02.893401] I [MSGID: 114035] > [client-handshake.c:193:client_set_lk_version_cbk] 0-data02-client-1: > Server lk version = 1 > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel >
Logan Barfield
2016-Feb-01 17:03 UTC
[Gluster-users] [Gluster-devel] Losing connection to bricks, Gluster processes restarting
Volume Name: data02 Type: Replicate Volume ID: 1c8928b1-f49e-4950-be06-0f8ce5adf870 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: gluster-stor01:/export/data/brick02 <-- 10.1.1.10 Brick2: gluster-stor02:/export/data/brick02 <-- 10.1.1.11 Options Reconfigured: server.event-threads: 5 client.event-threads: 11 geo-replication.indexing: on geo-replication.ignore-pid-check: on changelog.changelog: on server.statedump-path: /tmp server.outstanding-rpc-limit: 128 performance.io-thread-count: 64 performance.nfs.read-ahead: on performance.nfs.io-cache: on performance.nfs.quick-read: on performance.cache-max-file-size: 1MB performance.client-io-threads: on cluster.lookup-optimize: on performance.cache-size: 1073741824 performance.write-behind-window-size: 4MB performance.nfs.write-behind-window-size: 4MB performance.read-ahead: off performance.nfs.stat-prefetch: on Status of volume: data02 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick gluster-stor01:/export/data/brick02 49153 0 Y 17411 Brick gluster-stor02:/export/data/brick02 49155 0 Y 4717 NFS Server on localhost 2049 0 Y 17395 Self-heal Daemon on localhost N/A N/A Y 17405 NFS Server on gluster-stor02 2049 0 Y 4701 Self-heal Daemon on gluster-stor02 N/A N/A Y 4712 Task Status of Volume data02 ------------------------------------------------------------------------------ There are no active volume tasks Note that this problem was occurring with the same frequency before we added all of the volume options above. We were running defaults up until last week, and changing them had no impact on this particular problem. Thank You, Logan Barfield Tranquil Hosting On Fri, Jan 29, 2016 at 9:28 PM, Atin Mukherjee <amukherj at redhat.com> wrote:> Could you paste output of gluster volume info? > > ~Atin > > On 01/29/2016 11:59 PM, Logan Barfield wrote: > > We're running a fairly large 2-replica volume across two servers. The > > volume is approximately 20TB of small 1K-4MB files. The volume is > > exported via NFS, and mounted remotely by two clients. > > > > For the past few weeks the Gluster brick processes have been randomly > > restarting. Luckily they've been doing so at non-peak times, so we > > didn't notice until our monitoring checks happened to pick up on zombied > > 'glusterfs' process. > > > > From the logs it looks like something is blocking communication to the > > brick processes, and Gluster automatically restarts everything to > > compensate. I've so far not been able to figure out the underlying > cause. > > > > I've included log snippets from 'glustershd.log' and > > 'etc-glusterfs-glusterd.vol.log' here. If anyone can provide some > > insight into the issue it would be greatly appreciated. I'll also be > > happy to provide any further details as needed. > > > > > > [2016-01-29 05:03:47.039886] I [MSGID: 106144] > > [glusterd-pmap.c:274:pmap_registry_remove] 0-pmap: removing brick > > /export/data/brick02 on port 49155 > > [2016-01-29 05:03:47.075521] W [socket.c:588:__socket_rwv] 0-management: > > readv on /var/run/gluster/53a233b05f5d4be45dc94391bc3ebfe5.socket failed > > (No data available) > > [2016-01-29 05:03:47.078282] I [MSGID: 106005] > > [glusterd-handler.c:4908:__glusterd_brick_rpc_notify] 0-management: > > Brick gluster-stor02:/export/data/brick02 has disconnected from glusterd. > > [2016-01-29 05:03:47.149161] W [glusterfsd.c:1236:cleanup_and_exit] > > (-->/lib64/libpthread.so.0() [0x3e47a079d1] > > -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xcd) [0x405e6d] > > -->/usr/sbin/glusterd(cleanup_and_exit+0x65) [0x4059d5] ) 0-: recei > > ved signum (15), shutting down > > [2016-01-29 05:03:54.067012] I [MSGID: 100030] [glusterfsd.c:2318:main] > > 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.7.6 > > (args: /usr/sbin/glusterd --pid-file=/var/run/glusterd.pid) > > [2016-01-29 05:03:54.071901] I [MSGID: 106478] [glusterd.c:1350:init] > > 0-management: Maximum allowed open file descriptors set to 65536 > > [2016-01-29 05:03:54.071935] I [MSGID: 106479] [glusterd.c:1399:init] > > 0-management: Using /var/lib/glusterd as working directory > > [2016-01-29 05:03:54.075655] E [rpc-transport.c:292:rpc_transport_load] > > 0-rpc-transport: /usr/lib64/glusterfs/3.7.6/rpc-transport/rdma.so: > > cannot open shared object file: No such file or directory > > [2016-01-29 05:03:54.075672] W [rpc-transport.c:296:rpc_transport_load] > > 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not > > valid or not found on this machine > > [2016-01-29 05:03:54.075680] W [rpcsvc.c:1597:rpcsvc_transport_create] > > 0-rpc-service: cannot create listener, initing the transport failed > > [2016-01-29 05:03:54.075687] E [MSGID: 106243] [glusterd.c:1623:init] > > 0-management: creation of 1 listeners failed, continuing with succeeded > > transport > > [2016-01-29 05:03:55.869717] I [MSGID: 106513] > > [glusterd-store.c:2047:glusterd_restore_op_version] 0-glusterd: > > retrieved op-version: 30702 > > [2016-01-29 05:03:55.995747] I [MSGID: 106498] > > [glusterd-handler.c:3579:glusterd_friend_add_from_peerinfo] > > 0-management: connect returned 0 > > [2016-01-29 05:03:55.995866] I [rpc-clnt.c:984:rpc_clnt_connection_init] > > 0-management: setting frame-timeout to 600 > > [2016-01-29 05:03:56.000937] I [MSGID: 106544] > > [glusterd.c:159:glusterd_uuid_init] 0-management: retrieved UUID: > > 9b103ea8-d248-44fc-8f80-3e87f7c4971c > > Final graph: > > > +------------------------------------------------------------------------------+ > > 1: volume management > > 2: type mgmt/glusterd > > 3: option rpc-auth.auth-glusterfs on > > 4: option rpc-auth.auth-unix on > > 5: option rpc-auth.auth-null on > > 6: option rpc-auth-allow-insecure on > > 7: option transport.socket.listen-backlog 128 > > 8: option ping-timeout 30 > > 9: option transport.socket.read-fail-log off > > 10: option transport.socket.keepalive-interval 2 > > 11: option transport.socket.keepalive-time 10 > > 12: option transport-type rdma > > 13: option working-directory /var/lib/glusterd > > 14: end-volume > > 15: > > > +------------------------------------------------------------------------------+ > > [2016-01-29 05:03:56.002570] I [MSGID: 101190] > > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > > with index 2 > > [2016-01-29 05:03:56.003098] I [MSGID: 101190] > > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > > with index 1 > > [2016-01-29 05:03:56.003158] I [MSGID: 101190] > > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > > with index 2 > > [2016-01-29 05:03:56.855628] I [MSGID: 106493] > > [glusterd-rpc-ops.c:480:__glusterd_friend_add_cbk] 0-glusterd: Received > > ACC from uuid: 388a8bb4-c530-44ff-838b-8f7b9e4c95db, host: 10.1.1.10, > > port: 0 > > [2016-01-29 05:03:56.856787] I [rpc-clnt.c:984:rpc_clnt_connection_init] > > 0-nfs: setting frame-timeout to 600 > > [2016-01-29 05:03:57.859093] I [MSGID: 106540] > > [glusterd-utils.c:4191:glusterd_nfs_pmap_deregister] 0-glusterd: > > De-registered MOUNTV3 successfully > > [2016-01-29 05:03:57.860228] I [MSGID: 106540] > > [glusterd-utils.c:4200:glusterd_nfs_pmap_deregister] 0-glusterd: > > De-registered MOUNTV1 successfully > > [2016-01-29 05:03:57.861329] I [MSGID: 106540] > > [glusterd-utils.c:4209:glusterd_nfs_pmap_deregister] 0-glusterd: > > De-registered NFSV3 successfully > > [2016-01-29 05:03:57.862421] I [MSGID: 106540] > > [glusterd-utils.c:4218:glusterd_nfs_pmap_deregister] 0-glusterd: > > De-registered NLM v4 successfully > > [2016-01-29 05:03:57.863510] I [MSGID: 106540] > > [glusterd-utils.c:4227:glusterd_nfs_pmap_deregister] 0-glusterd: > > De-registered NLM v1 successfully > > [2016-01-29 05:03:57.864600] I [MSGID: 106540] > > [glusterd-utils.c:4236:glusterd_nfs_pmap_deregister] 0-glusterd: > > De-registered ACL v3 successfully > > [2016-01-29 05:03:57.870948] W [socket.c:3009:socket_connect] 0-nfs: > > Ignore failed connection attempt on , (No such file or directory) > > > > > > > > > > > > > > > > [2016-01-29 05:03:47.075614] W [socket.c:588:__socket_rwv] > > 0-data02-client-1: readv on 10.1.1.10:49155 > > <http://10.1.1.10:49155/> failed (No data available) > > [2016-01-29 05:03:47.076871] I [MSGID: 114018] > > [client.c:2042:client_rpc_notify] 0-data02-client-1: disconnected from > > data02-client-1. Client process will keep trying to connect to glusterd > > until brick's port is available > > [2016-01-29 05:03:47.170284] W [socket.c:588:__socket_rwv] 0-glusterfs: > > readv on 127.0.0.1:24007 <http://127.0.0.1:24007/> failed (No data > > available) > > [2016-01-29 05:03:47.639163] W [socket.c:588:__socket_rwv] > > 0-data02-client-0: readv on 10.1.1.11:49153 > > <http://10.1.1.11:49153/> failed (No data available) > > [2016-01-29 05:03:47.639206] I [MSGID: 114018] > > [client.c:2042:client_rpc_notify] 0-data02-client-0: disconnected from > > data02-client-0. Client process will keep trying to connect to glusterd > > until brick's port is available > > [2016-01-29 05:03:47.640222] E [MSGID: 108006] > > [afr-common.c:3880:afr_notify] 0-data02-replicate-0: All subvolumes are > > down. Going offline until atleast one of them comes back up. > > [2016-01-29 05:03:57.872983] W [glusterfsd.c:1236:cleanup_and_exit] > > (-->/lib64/libpthread.so.0() [0x3e47a079d1] > > -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e6d] > > -->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059d5] ) 0-: rec > > eived signum (15), shutting down > > [2016-01-29 05:03:58.881541] I [MSGID: 100030] [glusterfsd.c:2318:main] > > 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.6 > > (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd > > -p /var/lib/glusterd/ > > glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S > > /var/run/gluster/8d72de580ccac07d2ecfc2491a9b1648.socket --xlator-option > > *replicate*.node-uuid=9b103ea8-d248-44fc-8f80-3e87f7c4971c) > > [2016-01-29 05:03:58.890833] I [MSGID: 101190] > > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > > with index 1 > > [2016-01-29 05:03:59.340030] I [graph.c:269:gf_add_cmdline_options] > > 0-data02-replicate-0: adding option 'node-uuid' for volume > > 'data02-replicate-0' with value '9b103ea8-d248-44fc-8f80-3e87f7c4971c' > > [2016-01-29 05:03:59.342682] I [MSGID: 101190] > > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > > with index 2 > > [2016-01-29 05:03:59.342742] I [MSGID: 101190] > > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > > with index 3 > > [2016-01-29 05:03:59.342827] I [MSGID: 101190] > > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > > with index 4 > > [2016-01-29 05:03:59.342892] I [MSGID: 101190] > > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > > with index 5 > > [2016-01-29 05:03:59.342917] I [MSGID: 101190] > > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > > with index 6 > > [2016-01-29 05:03:59.343563] I [MSGID: 101190] > > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > > with index 8 > > [2016-01-29 05:03:59.343569] I [MSGID: 101190] > > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > > with index 7 > > [2016-01-29 05:03:59.343657] I [MSGID: 101190] > > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > > with index 9 > > [2016-01-29 05:03:59.343705] I [MSGID: 101190] > > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > > with index 11 > > [2016-01-29 05:03:59.343710] I [MSGID: 101190] > > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > > with index 10 > > [2016-01-29 05:03:59.344278] I [MSGID: 114020] [client.c:2118:notify] > > 0-data02-client-0: parent translators are ready, attempting connect on > > transport > > [2016-01-29 05:03:59.346553] I [MSGID: 114020] [client.c:2118:notify] > > 0-data02-client-1: parent translators are ready, attempting connect on > > transport > > Final graph: > > > +------------------------------------------------------------------------------+ > > 1: volume data02-client-0 > > 2: type protocol/client > > 3: option ping-timeout 42 > > 4: option remote-host gluster-stor01 > > 5: option remote-subvolume /export/data/brick02 > > 6: option transport-type socket > > 7: option username 5cc4f5d1-bcc8-4e06-ac74-520b20e2b452 > > 8: option password 66b85782-5833-4f2d-ad0e-8de75247b094F > > 9: option event-threads 11 > > 10: end-volume > > 11: > > 12: volume data02-client-1 > > 13: type protocol/client > > 14: option ping-timeout 42 > > 15: option remote-host gluster-stor02 > > 16: option remote-subvolume /export/data/brick02 > > 17: option transport-type socket > > 18: option username 5cc4f5d1-bcc8-4e06-ac74-520b20e2b452 > > 19: option password 66b85782-5833-4f2d-ad0e-8de75247b094 > > 20: option event-threads 11 > > 21: end-volume > > 22: > > 23: volume data02-replicate-0 > > 24: type cluster/replicate > > 25: option node-uuid 9b103ea8-d248-44fc-8f80-3e87f7c4971c > > 26: option background-self-heal-count 0 > > 27: option metadata-self-heal on > > 28: option data-self-heal on > > 29: option entry-self-heal on > > 30: option self-heal-daemon enable > > 31: option iam-self-heal-daemon yes > > 32: subvolumes data02-client-0 data02-client-1 > > 33: end-volume > > 34: > > 35: volume glustershd > > 36: type debug/io-stats > > 37: subvolumes data02-replicate-0 > > 38: end-volume > > 39: > > > +------------------------------------------------------------------------------+ > > [2016-01-29 05:03:59.348913] E [MSGID: 114058] > > [client-handshake.c:1524:client_query_portmap_cbk] 0-data02-client-1: > > failed to get the port number for remote subvolume. Please run 'gluster > > volume status' on server to see if brick process > > is running. > > [2016-01-29 05:03:59.348960] I [MSGID: 114018] > > [client.c:2042:client_rpc_notify] 0-data02-client-1: disconnected from > > data02-client-1. Client process will keep trying to connect to glusterd > > until brick's port is available > > [2016-01-29 05:03:59.436909] E [MSGID: 114058] > > [client-handshake.c:1524:client_query_portmap_cbk] 0-data02-client-0: > > failed to get the port number for remote subvolume. Please run 'gluster > > volume status' on server to see if brick process > > is running. > > [2016-01-29 05:03:59.436974] I [MSGID: 114018] > > [client.c:2042:client_rpc_notify] 0-data02-client-0: disconnected from > > data02-client-0. Client process will keep trying to connect to glusterd > > until brick's port is available > > [2016-01-29 05:03:59.436991] E [MSGID: 108006] > > [afr-common.c:3880:afr_notify] 0-data02-replicate-0: All subvolumes are > > down. Going offline until atleast one of them comes back up. > > [2016-01-29 05:04:02.886317] I [rpc-clnt.c:1847:rpc_clnt_reconfig] > > 0-data02-client-0: changing port to 49153 (from 0) > > [2016-01-29 05:04:02.888761] I [rpc-clnt.c:1847:rpc_clnt_reconfig] > > 0-data02-client-1: changing port to 49155 (from 0) > > [2016-01-29 05:04:02.891105] I [MSGID: 114057] > > [client-handshake.c:1437:select_server_supported_programs] > > 0-data02-client-0: Using Program GlusterFS 3.3, Num (1298437), Version > (330) > > [2016-01-29 05:04:02.891360] I [MSGID: 114046] > > [client-handshake.c:1213:client_setvolume_cbk] 0-data02-client-0: > > Connected to data02-client-0, attached to remote volume > > '/export/data/brick02'. > > [2016-01-29 05:04:02.891373] I [MSGID: 114047] > > [client-handshake.c:1224:client_setvolume_cbk] 0-data02-client-0: Server > > and Client lk-version numbers are not same, reopening the fds > > [2016-01-29 05:04:02.891403] I [MSGID: 108005] > > [afr-common.c:3841:afr_notify] 0-data02-replicate-0: Subvolume > > 'data02-client-0' came back up; going online. > > [2016-01-29 05:04:02.891518] I [MSGID: 114035] > > [client-handshake.c:193:client_set_lk_version_cbk] 0-data02-client-0: > > Server lk version = 1 > > [2016-01-29 05:04:02.893074] I [MSGID: 114057] > > [client-handshake.c:1437:select_server_supported_programs] > > 0-data02-client-1: Using Program GlusterFS 3.3, Num (1298437), Version > (330) > > [2016-01-29 05:04:02.893251] I [MSGID: 114046] > > [client-handshake.c:1213:client_setvolume_cbk] 0-data02-client-1: > > Connected to data02-client-1, attached to remote volume > > '/export/data/brick02'. > > [2016-01-29 05:04:02.893276] I [MSGID: 114047] > > [client-handshake.c:1224:client_setvolume_cbk] 0-data02-client-1: Server > > and Client lk-version numbers are not same, reopening the fds > > [2016-01-29 05:04:02.893401] I [MSGID: 114035] > > [client-handshake.c:193:client_set_lk_version_cbk] 0-data02-client-1: > > Server lk version = 1 > > > > > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel at gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-devel > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160201/ea1af0f5/attachment.html>