Ravishankar N
2015-Apr-07 04:00 UTC
[Gluster-users] Unable to make HA work; mounts hang on remote node reboot
On 04/07/2015 04:15 AM, CJ Baar wrote:> I am hoping someone can give me some direction on this. I have been searching and trying various tweaks all day. I am trying to setup a two-node cluster with a replicated volume. Each node has a brick under /export, and a local mount using glusterfs under /mnt. > gluster volume create test1 rep 2 g01.x.local:/exports/sdb1/brick g02.x.local:/exports/sdb1/brick > gluster volume start test1 > mount -t glusterfs g01.x.local:/test1 /mnt/test1 > When I write a file to one node, it shows up instantly on the other? just as I expect it to. The volume was created as: > > My problem is that if I reboot one node, the mount on the other completely hangs until the rebooted node comes back up. This seems to defeat the purpose of being highly-available. Is there some setting I am missing? How do I keep the volume on a single node alive during a failure? > Any info is appreciated. Thank you.You can explore the network.ping-timeout setting; try reducing it from the default value of 42 seconds. -Ravi> _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users
Joe Julian
2015-Apr-07 04:22 UTC
[Gluster-users] Unable to make HA work; mounts hang on remote node reboot
On 04/06/2015 09:00 PM, Ravishankar N wrote:> > > On 04/07/2015 04:15 AM, CJ Baar wrote: >> I am hoping someone can give me some direction on this. I have been >> searching and trying various tweaks all day. I am trying to setup a >> two-node cluster with a replicated volume. Each node has a brick >> under /export, and a local mount using glusterfs under /mnt. >> gluster volume create test1 rep 2 g01.x.local:/exports/sdb1/brick >> g02.x.local:/exports/sdb1/brick >> gluster volume start test1 >> mount -t glusterfs g01.x.local:/test1 /mnt/test1 >> When I write a file to one node, it shows up instantly on the other? >> just as I expect it to. The volume was created as: >> >> My problem is that if I reboot one node, the mount on the other >> completely hangs until the rebooted node comes back up. This seems to >> defeat the purpose of being highly-available. Is there some setting I >> am missing? How do I keep the volume on a single node alive during a >> failure? >> Any info is appreciated. Thank you. > > You can explore the network.ping-timeout setting; try reducing it > from the default value of 42 seconds. > -RaviThat's probably wrong. If you're doing a proper reboot, the services should be stopped before shutting down, which will do all the proper handshaking for shutting down a tcp connection. This allows the client to avoid the ping-timeout. Ping-timeout only comes in to play if there's a sudden - unexpected communication loss with the server such as power loss, network partition, etc. Most communication losses should be transient and recovery is less impactful if you can wait for the transient issue to resolve. No, if you're hanging when one server is shut down, then your client isn't connecting to all the servers as it should. Check your client logs to figure out why.