Joe Julian
2015-Apr-07 04:22 UTC
[Gluster-users] Unable to make HA work; mounts hang on remote node reboot
On 04/06/2015 09:00 PM, Ravishankar N wrote:> > > On 04/07/2015 04:15 AM, CJ Baar wrote: >> I am hoping someone can give me some direction on this. I have been >> searching and trying various tweaks all day. I am trying to setup a >> two-node cluster with a replicated volume. Each node has a brick >> under /export, and a local mount using glusterfs under /mnt. >> gluster volume create test1 rep 2 g01.x.local:/exports/sdb1/brick >> g02.x.local:/exports/sdb1/brick >> gluster volume start test1 >> mount -t glusterfs g01.x.local:/test1 /mnt/test1 >> When I write a file to one node, it shows up instantly on the other? >> just as I expect it to. The volume was created as: >> >> My problem is that if I reboot one node, the mount on the other >> completely hangs until the rebooted node comes back up. This seems to >> defeat the purpose of being highly-available. Is there some setting I >> am missing? How do I keep the volume on a single node alive during a >> failure? >> Any info is appreciated. Thank you. > > You can explore the network.ping-timeout setting; try reducing it > from the default value of 42 seconds. > -RaviThat's probably wrong. If you're doing a proper reboot, the services should be stopped before shutting down, which will do all the proper handshaking for shutting down a tcp connection. This allows the client to avoid the ping-timeout. Ping-timeout only comes in to play if there's a sudden - unexpected communication loss with the server such as power loss, network partition, etc. Most communication losses should be transient and recovery is less impactful if you can wait for the transient issue to resolve. No, if you're hanging when one server is shut down, then your client isn't connecting to all the servers as it should. Check your client logs to figure out why.
Hoggins!
2015-Apr-07 08:19 UTC
[Gluster-users] Unable to make HA work; mounts hang on remote node reboot
Hello, Le 07/04/2015 06:22, Joe Julian a ?crit :> That's probably wrong. If you're doing a proper reboot, the services > should be stopped before shutting down, which will do all the proper > handshaking for shutting down a tcp connection. This allows the client > to avoid the ping-timeout. Ping-timeout only comes in to play if > there's a sudden - unexpected communication loss with the server such > as power loss, network partition, etc. Most communication losses > should be transient and recovery is less impactful if you can wait for > the transient issue to resolve. > > No, if you're hanging when one server is shut down, then your client > isn't connecting to all the servers as it should. Check your client > logs to figure out why.I used to have this dramatic behavior on my bricks with two servers connected through an IPSEC tunnel. Turns out that, when one of the servers was shut down, the IPSEC tunnel was closed before the Gluster services, making that "connection close" handshake never reaching the other end. Now my hosting company has implemented a virtual rack system that allows me to seamlessly connect servers accross the country as if they were on the same physical switch. I do not encounter that problem anymore. Cheers ! -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: <gluster.org/pipermail/gluster-users/attachments/20150407/7c6d42f1/attachment.sig>
CJ Baar
2015-Apr-07 16:41 UTC
[Gluster-users] Unable to make HA work; mounts hang on remote node reboot
> On Apr 6, 2015, at 10:22 PM, Joe Julian <joe at julianfamily.org> wrote: > > On 04/06/2015 09:00 PM, Ravishankar N wrote: >> >> >> On 04/07/2015 04:15 AM, CJ Baar wrote: >>> I am hoping someone can give me some direction on this. I have been searching and trying various tweaks all day. I am trying to setup a two-node cluster with a replicated volume. Each node has a brick under /export, and a local mount using glusterfs under /mnt. >>> gluster volume create test1 rep 2 g01.x.local:/exports/sdb1/brick g02.x.local:/exports/sdb1/brick >>> gluster volume start test1 >>> mount -t glusterfs g01.x.local:/test1 /mnt/test1 >>> When I write a file to one node, it shows up instantly on the other? just as I expect it to. The volume was created as: >>> >>> My problem is that if I reboot one node, the mount on the other completely hangs until the rebooted node comes back up. This seems to defeat the purpose of being highly-available. Is there some setting I am missing? How do I keep the volume on a single node alive during a failure? >>> Any info is appreciated. Thank you. >> >> You can explore the network.ping-timeout setting; try reducing it from the default value of 42 seconds. >> -Ravi > That's probably wrong. If you're doing a proper reboot, the services should be stopped before shutting down, which will do all the proper handshaking for shutting down a tcp connection. This allows the client to avoid the ping-timeout. Ping-timeout only comes in to play if there's a sudden - unexpected communication loss with the server such as power loss, network partition, etc. Most communication losses should be transient and recovery is less impactful if you can wait for the transient issue to resolve. > > No, if you're hanging when one server is shut down, then your client isn't connecting to all the servers as it should. Check your client logs to figure out why.The logs, as I interpret them, show both bricks successfully being connected when I do a mount. (mount -t glusterfs g01.x.local:/test1 /mnt/test1) It even claims to be setting the read preference to the correct local brick. [2015-04-07 16:13:05.581085] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-test1-client-0: changing port to 49152 (from 0) [2015-04-07 16:13:05.583826] I [client-handshake.c:1413:select_server_supported_programs] 0-test1-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-04-07 16:13:05.584017] I [client-handshake.c:1200:client_setvolume_cbk] 0-test1-client-0: Connected to test1-client-0, attached to remote volume '/exports/sdb1/brick'. [2015-04-07 16:13:05.584030] I [client-handshake.c:1210:client_setvolume_cbk] 0-test1-client-0: Server and Client lk-version numbers are not same, reopening the fds [2015-04-07 16:13:05.584122] I [MSGID: 108005] [afr-common.c:3552:afr_notify] 0-test1-replicate-0: Subvolume 'test1-client-0' came back up; going online. [2015-04-07 16:13:05.584146] I [client-handshake.c:188:client_set_lk_version_cbk] 0-test1-client-0: Server lk version = 1 [2015-04-07 16:13:05.585647] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-test1-client-1: changing port to 49152 (from 0) [2015-04-07 16:13:05.590017] I [client-handshake.c:1413:select_server_supported_programs] 0-test1-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-04-07 16:13:05.591067] I [client-handshake.c:1200:client_setvolume_cbk] 0-test1-client-1: Connected to test1-client-1, attached to remote volume '/exports/sdb1/brick'. [2015-04-07 16:13:05.591079] I [client-handshake.c:1210:client_setvolume_cbk] 0-test1-client-1: Server and Client lk-version numbers are not same, reopening the fds [2015-04-07 16:13:05.595077] I [fuse-bridge.c:5080:fuse_graph_setup] 0-fuse: switched to graph 0 [2015-04-07 16:13:05.595144] I [client-handshake.c:188:client_set_lk_version_cbk] 0-test1-client-1: Server lk version = 1 [2015-04-07 16:13:05.595265] I [fuse-bridge.c:4009:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.22 [2015-04-07 16:13:05.596883] I [afr-common.c:1484:afr_local_discovery_cbk] 0-test1-replicate-0: selecting local read_child test1-client-0 This is all the log I get on node1 when I drop node2. It takes almost two minutes for node1 to resume. [2015-04-07 16:20:48.278742] W [socket.c:611:__socket_rwv] 0-management: readv on 172.32.65.241:24007 failed (No data available) [2015-04-07 16:20:48.278837] I [MSGID: 106004] [glusterd-handler.c:4365:__glusterd_peer_rpc_notify] 0-management: Peer 1069f037-13eb-458e-a9c4-0e7e79e595d0, in Peer in Cluster state, has disconnected from glusterd. [2015-04-07 16:20:48.279062] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f736ad56550] (--> /usr/lib64/glusterfs/3.6.2/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x428)[0x7f735fdf1df8] (--> /usr/lib64/glusterfs/3.6.2/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x262)[0x7f735fd662c2] (--> /usr/lib64/glusterfs/3.6.2/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7f735fd51a80] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3)[0x7f736ab2bf63] ))))) 0-management: Lock for vol test1 not held [2015-04-07 16:22:24.766177] W [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx modification failed [2015-04-07 16:22:24.766587] I [glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management: Received status volume req for volume test1 If I try a ?graceful? shutdown by manually stopping the glusterd services, the mount stays up and works? until the node itself is shutdown. This is the log from node1 after issuing ?service glusterd stop? on node2. [2015-04-07 16:32:57.224545] W [socket.c:611:__socket_rwv] 0-management: readv on 172.32.65.241:24007 failed (No data available) [2015-04-07 16:32:57.224612] I [MSGID: 106004] [glusterd-handler.c:4365:__glusterd_peer_rpc_notify] 0-management: Peer 1069f037-13eb-458e-a9c4-0e7e79e595d0, in Peer in Cluster state, has disconnected from glusterd. [2015-04-07 16:32:57.224829] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f736ad56550] (--> /usr/lib64/glusterfs/3.6.2/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x428)[0x7f735fdf1df8] (--> /usr/lib64/glusterfs/3.6.2/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x262)[0x7f735fd662c2] (--> /usr/lib64/glusterfs/3.6.2/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7f735fd51a80] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3)[0x7f736ab2bf63] ))))) 0-management: Lock for vol test1 not held [2015-04-07 16:33:03.506088] W [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx modification failed [2015-04-07 16:33:03.506619] I [glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management: Received status volume req for volume test1 [2015-04-07 16:33:08.498391] E [socket.c:2267:socket_connect_finish] 0-management: connection to 172.32.65.241:24007 failed (Connection refused) At this point, the mount on node1 is still responsive, even though gluster itself is down on node2, and confirmed by a status output. Status of volume: test1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick g01.x.local:/exports/sdb1/brick 49152 Y 22739 NFS Server on localhost 2049 Y 22746 Self-heal Daemon on localhost N/A Y 22751 Task Status of Volume test1 ------------------------------------------------------------------------------ There are no active volume tasks Then, I issue ?init 0? on node2, and the mount on node1 becomes unresponsive. This is the log from node1 [2015-04-07 16:36:04.250693] W [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx modification failed [2015-04-07 16:36:04.251102] I [glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management: Received status volume req for volume test1 The message "I [MSGID: 106004] [glusterd-handler.c:4365:__glusterd_peer_rpc_notify] 0-management: Peer 1069f037-13eb-458e-a9c4-0e7e79e595d0, in Peer in Cluster state, has disconnected from glusterd." repeated 39 times between [2015-04-07 16:34:40.609878] and [2015-04-07 16:36:37.752489] [2015-04-07 16:36:40.755989] I [MSGID: 106004] [glusterd-handler.c:4365:__glusterd_peer_rpc_notify] 0-management: Peer 1069f037-13eb-458e-a9c4-0e7e79e595d0, in Peer in Cluster state, has disconnected from glusterd. This does not seem like desired behaviour. I was trying to create this cluster because I was under the impression it would be more resilient than a single-point-of-failure NFS server. However, if the mount halts when one node in the cluster dies, then I?m no better off. I also can?t seem to figure out how to bring a volume online if only one node in the cluster is running; again, not really functioning as HA. The gluster service runs and the volume ?starts?, but it is not ?online? or mountable until both nodes are running. In a situation where a node fails and we need storage online before we can troubleshoot the cause of the node failure, how do I get a volume to go online? Thanks.