Greg Scott
2014-Feb-23 01:44 UTC
[Gluster-users] One node goes offline, the other node loses its connection to its local Gluster volume
We first went down this path back in July 2013 and now I'm back again for more. It's a similar situation but now with new versions of everything. I'm using glusterfs 3.4.2 with Fedora 20. I have 2 nodes named fw1 and fw2. When I ifdown the NIC I'm using for Gluster on either node, that node cannot see its Gluster volume, but the other node can see it after a timeout. As soon as I ifup that NIC, everyone can see everything again. Is this expected behavior? When that interconnect drops, I want both nodes to see their own local copy and then sync everything back up when the interconnect connects again. Here are details. Node fw1 has an XFS filesystem named gluster-fw1. Node fw2 has an XFS filesystem named gluster-fw2. Those are both gluster bricks and both nodes mount the bricks as /firewall-scripts. So anything one node does in /firewall-scripts should also be on the other node within a few milliseconds. The test is to isolate the nodes from each other and see if they can still access their own local copy of /firewall-scripts. The easiest way to do this is to ifdown the interconnect NIC. But this doesn't work. Here is what happens when I ifdown the NIC on node fw1. Node fw2 can see /firewall-scripts but fw1 shows an error. When I ifdown on fw2, the behavior is identical, but swapping fw1 and fw2. On fw1, after an ifdown I lose connection with my Gluster filesystem. [root at stylmark-fw1 firewall-scripts]# ifdown enp5s4 [root at stylmark-fw1 firewall-scripts]# ls /firewall-scripts ls: cannot access /firewall-scripts: Transport endpoint is not connected [root at stylmark-fw1 firewall-scripts]# df -h df: ?/firewall-scripts?: Transport endpoint is not connected Filesystem Size Used Avail Use% Mounted on /dev/mapper/fedora-root 17G 2.2G 14G 14% / devtmpfs 989M 0 989M 0% /dev tmpfs 996M 0 996M 0% /dev/shm tmpfs 996M 564K 996M 1% /run tmpfs 996M 0 996M 0% /sys/fs/cgroup tmpfs 996M 0 996M 0% /tmp /dev/sda2 477M 87M 362M 20% /boot /dev/sda1 200M 9.6M 191M 5% /boot/efi /dev/mapper/fedora-gluster--fw1 9.8G 33M 9.8G 1% /gluster-fw1 10.10.10.2:/fwmaster 214G 75G 128G 37% /mnt/fwmaster [root at stylmark-fw1 firewall-scripts]# But on fw2, I can still look at it: [root at stylmark-fw2 ~]# ls /firewall-scripts allow-all failover-monitor.sh rcfirewall.conf allow-all-with-nat initial_rc.firewall start-failover-monitor.sh etc rc.firewall var [root at stylmark-fw2 ~]# [root at stylmark-fw2 ~]# [root at stylmark-fw2 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/fedora-root 17G 2.3G 14G 14% / devtmpfs 989M 0 989M 0% /dev tmpfs 996M 0 996M 0% /dev/shm tmpfs 996M 560K 996M 1% /run tmpfs 996M 0 996M 0% /sys/fs/cgroup tmpfs 996M 0 996M 0% /tmp /dev/sda2 477M 87M 362M 20% /boot /dev/sda1 200M 9.6M 191M 5% /boot/efi /dev/mapper/fedora-gluster--fw2 9.8G 33M 9.8G 1% /gluster-fw2 192.168.253.2:/firewall-scripts 9.8G 33M 9.8G 1% /firewall-scripts 10.10.10.2:/fwmaster 214G 75G 128G 37% /mnt/fwmaster [root at stylmark-fw2 ~]# And back to fw1 - after an ifup, I can see it again: [root at stylmark-fw1 firewall-scripts]# ifup enp5s4 [root at stylmark-fw1 firewall-scripts]# [root at stylmark-fw1 firewall-scripts]# ls /firewall-scripts allow-all failover-monitor.sh rcfirewall.conf allow-all-with-nat initial_rc.firewall start-failover-monitor.sh etc rc.firewall var [root at stylmark-fw1 firewall-scripts]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/fedora-root 17G 2.2G 14G 14% / devtmpfs 989M 0 989M 0% /dev tmpfs 996M 0 996M 0% /dev/shm tmpfs 996M 564K 996M 1% /run tmpfs 996M 0 996M 0% /sys/fs/cgroup tmpfs 996M 0 996M 0% /tmp /dev/sda2 477M 87M 362M 20% /boot /dev/sda1 200M 9.6M 191M 5% /boot/efi /dev/mapper/fedora-gluster--fw1 9.8G 33M 9.8G 1% /gluster-fw1 192.168.253.1:/firewall-scripts 9.8G 33M 9.8G 1% /firewall-scripts 10.10.10.2:/fwmaster 214G 75G 128G 37% /mnt/fwmaster [root at stylmark-fw1 firewall-scripts]# What can I do about this? Thanks - Greg Scott -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140223/3d1df1a3/attachment.html>
Chalcogen
2014-Feb-23 10:46 UTC
[Gluster-users] One node goes offline, the other node loses its connection to its local Gluster volume
I'm not from the glusterfs development team or anything, but I, too started with glusterfs somewhere around the time frame you mention, and also work with a twin-replicated setup just like yours. When I do what you indicate here on my setup, the command initially hangs, and on both servers for about as long as the peer ping timeout thing (which is defaulted at 48 secs or so). After that it works. If we can see new bugs in this setup then I would be interested, in part because the stability of my product depends upon this, too. Do you think you could share your glulster volume info and gluster volume status? Also, what did heal info say before you performed this exercise? Thanks, Anirban On Sunday 23 February 2014 07:14 AM, Greg Scott wrote:> > We first went down this path back in July 2013 and now I'm back again > for more. It's a similar situation but now with new versions of > everything. I'm using glusterfs 3.4.2 with Fedora 20. > > I have 2 nodes named fw1 and fw2. When I ifdown the NIC I'm using for > Gluster on either node, that node cannot see its Gluster volume, but > the other node can see it after a timeout. As soon as I ifup that > NIC, everyone can see everything again. > > Is this expected behavior? When that interconnect drops, I want both > nodes to see their own local copy and then sync everything back up > when the interconnect connects again. > > Here are details. Node fw1 has an XFS filesystem named gluster-fw1. > Node fw2 has an XFS filesystem named gluster-fw2. Those are both > gluster bricks and both nodes mount the bricks as /firewall-scripts. > So anything one node does in /firewall-scripts should also be on the > other node within a few milliseconds. The test is to isolate the > nodes from each other and see if they can still access their own local > copy of /firewall-scripts. The easiest way to do this is to ifdown > the interconnect NIC. But this doesn't work. > > Here is what happens when I ifdown the NIC on node fw1. Node fw2 can > see /firewall-scripts but fw1 shows an error. When I ifdown on fw2, > the behavior is identical, but swapping fw1 and fw2. > > On fw1, after an ifdown I lose connection with my Gluster filesystem. > > [root at stylmark-fw1 firewall-scripts]# ifdown enp5s4 > > [root at stylmark-fw1 firewall-scripts]# ls /firewall-scripts > > ls: cannot access /firewall-scripts: Transport endpoint is not connected > > [root at stylmark-fw1 firewall-scripts]# df -h > > df: ?/firewall-scripts?: Transport endpoint is not connected > > Filesystem Size Used Avail Use% Mounted on > > /dev/mapper/fedora-root 17G 2.2G 14G 14% / > > devtmpfs 989M 0 989M 0% /dev > > tmpfs 996M 0 996M 0% /dev/shm > > tmpfs 996M 564K 996M 1% /run > > tmpfs 996M 0 996M 0% /sys/fs/cgroup > > tmpfs 996M 0 996M 0% /tmp > > /dev/sda2 477M 87M 362M 20% /boot > > /dev/sda1 200M 9.6M 191M 5% /boot/efi > > /dev/mapper/fedora-gluster--fw1 9.8G 33M 9.8G 1% /gluster-fw1 > > 10.10.10.2:/fwmaster 214G 75G 128G 37% /mnt/fwmaster > > [root at stylmark-fw1 firewall-scripts]# > > But on fw2, I can still look at it: > > [root at stylmark-fw2 ~]# ls /firewall-scripts > > allow-all failover-monitor.sh rcfirewall.conf > > allow-all-with-nat initial_rc.firewall start-failover-monitor.sh > > etc rc.firewall var > > [root at stylmark-fw2 ~]# > > [root at stylmark-fw2 ~]# > > [root at stylmark-fw2 ~]# df -h > > Filesystem Size Used Avail Use% Mounted on > > /dev/mapper/fedora-root 17G 2.3G 14G 14% / > > devtmpfs 989M 0 989M 0% /dev > > tmpfs 996M 0 996M 0% /dev/shm > > tmpfs 996M 560K 996M 1% /run > > tmpfs 996M 0 996M 0% /sys/fs/cgroup > > tmpfs 996M 0 996M 0% /tmp > > /dev/sda2 477M 87M 362M 20% /boot > > /dev/sda1 200M 9.6M 191M 5% /boot/efi > > /dev/mapper/fedora-gluster--fw2 9.8G 33M 9.8G 1% /gluster-fw2 > > 192.168.253.2:/firewall-scripts 9.8G 33M 9.8G 1% /firewall-scripts > > 10.10.10.2:/fwmaster 214G 75G 128G 37% /mnt/fwmaster > > [root at stylmark-fw2 ~]# > > And back to fw1 -- after an ifup, I can see it again: > > [root at stylmark-fw1 firewall-scripts]# ifup enp5s4 > > [root at stylmark-fw1 firewall-scripts]# > > [root at stylmark-fw1 firewall-scripts]# ls /firewall-scripts > > allow-all failover-monitor.sh rcfirewall.conf > > allow-all-with-nat initial_rc.firewall start-failover-monitor.sh > > etc rc.firewall var > > [root at stylmark-fw1 firewall-scripts]# df -h > > Filesystem Size Used Avail Use% Mounted on > > /dev/mapper/fedora-root 17G 2.2G 14G 14% / > > devtmpfs 989M 0 989M 0% /dev > > tmpfs 996M 0 996M 0% /dev/shm > > tmpfs 996M 564K 996M 1% /run > > tmpfs 996M 0 996M 0% /sys/fs/cgroup > > tmpfs 996M 0 996M 0% /tmp > > /dev/sda2 477M 87M 362M 20% /boot > > /dev/sda1 200M 9.6M 191M 5% /boot/efi > > /dev/mapper/fedora-gluster--fw1 9.8G 33M 9.8G 1% /gluster-fw1 > > 192.168.253.1:/firewall-scripts 9.8G 33M 9.8G 1% /firewall-scripts > > 10.10.10.2:/fwmaster 214G 75G 128G 37% /mnt/fwmaster > > [root at stylmark-fw1 firewall-scripts]# > > What can I do about this? > > Thanks > > -Greg Scott > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140223/17533a00/attachment.html>
Greg Scott
2014-Mar-06 22:02 UTC
[Gluster-users] One node goes offline, the other node loses its connection to its local Gluster volume
Sorry Anirban, I didn't mean to disappear into a black hole a couple weeks ago. I've been away from this for a while and I just now have a chance to look at the replies. One suggestion was to try an iptables rule instead of ifdown to simulate my outage and I'll try that in a little while and post results. Meantime, for Anirban - here is gluster volume and gluster volume status from both nodes. I don't know how to answer your heal info question.>From fw2:[root at stylmark-fw2 ~]# gluster volume info Volume Name: firewall-scripts Type: Replicate Volume ID: 4928aacb-85eb-40f3-9969-2bfe7f23f08d Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 192.168.253.1:/gluster-fw1/gluster-fw1 Brick2: 192.168.253.2:/gluster-fw2/gluster-fw2 Options Reconfigured: network.ping-timeout: 5 [root at stylmark-fw2 ~]# [root at stylmark-fw2 ~]# [root at stylmark-fw2 ~]# gluster volume status Status of volume: firewall-scripts Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 192.168.253.1:/gluster-fw1/gluster-fw1 49152 Y 1303 Brick 192.168.253.2:/gluster-fw2/gluster-fw2 49152 Y 1294 NFS Server on localhost 2049 Y 1298 Self-heal Daemon on localhost N/A Y 1304 NFS Server on 192.168.253.1 2049 Y 1309 Self-heal Daemon on 192.168.253.1 N/A Y 1314 There are no active volume tasks [root at stylmark-fw2 ~]# And from fw1: [root at stylmark-fw1 ~]# gluster volume linfo unrecognized word: linfo (position 1) [root at stylmark-fw1 ~]# gluster volume info Volume Name: firewall-scripts Type: Replicate Volume ID: 4928aacb-85eb-40f3-9969-2bfe7f23f08d Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 192.168.253.1:/gluster-fw1/gluster-fw1 Brick2: 192.168.253.2:/gluster-fw2/gluster-fw2 Options Reconfigured: network.ping-timeout: 5 [root at stylmark-fw1 ~]# [root at stylmark-fw1 ~]# [root at stylmark-fw1 ~]# gluster volume status Status of volume: firewall-scripts Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 192.168.253.1:/gluster-fw1/gluster-fw1 49152 Y 1303 Brick 192.168.253.2:/gluster-fw2/gluster-fw2 49152 Y 1294 NFS Server on localhost 2049 Y 1309 Self-heal Daemon on localhost N/A Y 1314 NFS Server on 192.168.253.2 2049 Y 1298 Self-heal Daemon on 192.168.253.2 N/A Y 1304 There are no active volume tasks [root at stylmark-fw1 ~]# - Greg From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Chalcogen Sent: Sunday, February 23, 2014 4:46 AM To: gluster-users at gluster.org Subject: Re: [Gluster-users] One node goes offline, the other node loses its connection to its local Gluster volume I'm not from the glusterfs development team or anything, but I, too started with glusterfs somewhere around the time frame you mention, and also work with a twin-replicated setup just like yours. When I do what you indicate here on my setup, the command initially hangs, and on both servers for about as long as the peer ping timeout thing (which is defaulted at 48 secs or so). After that it works. If we can see new bugs in this setup then I would be interested, in part because the stability of my product depends upon this, too. Do you think you could share your glulster volume info and gluster volume status? Also, what did heal info say before you performed this exercise? Thanks, Anirban On Sunday 23 February 2014 07:14 AM, Greg Scott wrote: We first went down this path back in July 2013 and now I'm back again for more.? It's a similar situation but now with new versions of everything.? ?I'm using glusterfs 3.4.2 with Fedora 20.? ? I have 2 nodes named fw1 and fw2.? When I ifdown the NIC I'm using for Gluster on either node, that node cannot see? its Gluster volume, but the other node can see it after a timeout.? As soon as I ifup that NIC, everyone can see everything again.? ? Is this expected behavior?? When that interconnect drops, I want both nodes to see their own local copy and then sync everything back up when the interconnect connects again.? ? Here are details.? Node fw1 has an XFS filesystem named gluster-fw1.? Node fw2 has an XFS filesystem named gluster-fw2.? ?Those are both gluster bricks and both nodes mount the bricks as /firewall-scripts.? So anything one node does in /firewall-scripts should also be on the other node within a few milliseconds.?? The test is to isolate the nodes from each other and see if they can still access their own local copy of /firewall-scripts.? The easiest way to do this is to ifdown the interconnect NIC.? But this doesn't work.? ? Here is what happens when I ifdown the NIC on node fw1.? Node fw2 can see /firewall-scripts but fw1 shows an error.? When I ifdown on fw2, the behavior is identical, but swapping fw1 and fw2. ? On fw1, after an ifdown ?I lose connection with my Gluster filesystem. ? [root at stylmark-fw1 firewall-scripts]# ifdown enp5s4 [root at stylmark-fw1 firewall-scripts]# ls /firewall-scripts ls: cannot access /firewall-scripts: Transport endpoint is not connected [root at stylmark-fw1 firewall-scripts]# df -h df: ?/firewall-scripts?: Transport endpoint is not connected Filesystem?????????????????????? Size? Used Avail Use% Mounted on /dev/mapper/fedora-root?????????? 17G? 2.2G?? 14G? 14% / devtmpfs???????????????????????? 989M???? 0? 989M?? 0% /dev tmpfs??????????????????????????? 996M?? ??0? 996M?? 0% /dev/shm tmpfs??????????????????????????? 996M? 564K? 996M?? 1% /run tmpfs??????????????????????????? 996M???? 0? 996M?? 0% /sys/fs/cgroup tmpfs??????????????????????????? 996M???? 0? 996M?? 0% /tmp /dev/sda2??????????????????????? 477M?? 87M? 362M? 20% /boot /dev/sda1??????????????????????? 200M? 9.6M? 191M?? 5% /boot/efi /dev/mapper/fedora-gluster--fw1? 9.8G?? 33M? 9.8G?? 1% /gluster-fw1 10.10.10.2:/fwmaster???????????? 214G?? 75G? 128G? 37% /mnt/fwmaster [root at stylmark-fw1 firewall-scripts]# ? But on fw2, I can still look at it: ? [root at stylmark-fw2 ~]# ls /firewall-scripts allow-all?????????? failover-monitor.sh? rcfirewall.conf allow-all-with-nat? initial_rc.firewall? start-failover-monitor.sh etc???????????????? rc.firewall????????? var [root at stylmark-fw2 ~]# [root at stylmark-fw2 ~]# [root at stylmark-fw2 ~]# df -h Filesystem?????????????????????? Size? Used Avail Use% Mounted on /dev/mapper/fedora-root?????????? 17G? 2.3G?? 14G? 14% / devtmpfs???????????????????????? 989M??? ?0? 989M?? 0% /dev tmpfs??????????????????????????? 996M???? 0? 996M?? 0% /dev/shm tmpfs??????????????????????????? 996M? 560K? 996M?? 1% /run tmpfs??????????????????????????? 996M???? 0? 996M?? 0% /sys/fs/cgroup tmpfs??????????????????????????? 996M???? 0? 996M?? 0% /tmp /dev/sda2??????????????????????? 477M?? 87M? 362M? 20% /boot /dev/sda1??????????????????????? 200M? 9.6M? 191M?? 5% /boot/efi /dev/mapper/fedora-gluster--fw2? 9.8G?? 33M? 9.8G?? 1% /gluster-fw2 192.168.253.2:/firewall-scripts? 9.8G?? 33M? 9.8G?? 1% /firewall-scripts 10.10.10.2:/fwmaster???????????? 214G?? 75G? 128G? 37% /mnt/fwmaster [root at stylmark-fw2 ~]# ? And back to fw1 - after an ifup, I can see it again: ? [root at stylmark-fw1 firewall-scripts]# ifup enp5s4 [root at stylmark-fw1 firewall-scripts]# [root at stylmark-fw1 firewall-scripts]# ls /firewall-scripts allow-all?????????? failover-monitor.sh? rcfirewall.conf allow-all-with-nat? initial_rc.firewall? start-failover-monitor.sh etc???????????????? rc.firewall????????? var [root at stylmark-fw1 firewall-scripts]# df -h Filesystem?????????????????????? Size? Used Avail Use% Mounted on /dev/mapper/fedora-root?????????? 17G? 2.2G?? 14G? 14% / devtmpfs???????????????????????? 989M???? 0? 989M?? 0% /dev tmpfs??????????????????????????? 996M??? ?0? 996M?? 0% /dev/shm tmpfs??????????????????????????? 996M? 564K? 996M?? 1% /run tmpfs??????????????????????????? 996M???? 0? 996M?? 0% /sys/fs/cgroup tmpfs??????????????????????????? 996M???? 0? 996M?? 0% /tmp /dev/sda2??????????????????????? 477M?? 87M? 362M? 20% /boot /dev/sda1??????????????????????? 200M? 9.6M? 191M?? 5% /boot/efi /dev/mapper/fedora-gluster--fw1? 9.8G?? 33M? 9.8G?? 1% /gluster-fw1 192.168.253.1:/firewall-scripts? 9.8G?? 33M? 9.8G?? 1% /firewall-scripts 10.10.10.2:/fwmaster??????????? ?214G?? 75G? 128G? 37% /mnt/fwmaster [root at stylmark-fw1 firewall-scripts]# ? What can I do about this? ? Thanks ? - Greg Scott _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Joe Julian
2014-Mar-06 22:18 UTC
[Gluster-users] One node goes offline, the other node loses its connection to its local Gluster volume
On 02/22/2014 05:44 PM, Greg Scott wrote:> > I have 2 nodes named fw1 and fw2. When I ifdown the NIC I'm using for > Gluster on either node, that node cannot see its Gluster volume, but > the other node can see it after a timeout. As soon as I ifup that > NIC, everyone can see everything again. > > Is this expected behavior? When that interconnect drops, I want both > nodes to see their own local copy and then sync everything back up > when the interconnect connects again. >If a client loses communication on an open tcp connection to a server, there is a timeout period (defaults to 42 seconds) where the client waits for the communication to continue as dropping and re-establishing hundreds to potentially tens of thousands of file descriptors and locks is a very expensive process, disruptive to the entire environment. With the test process you're describing, the clients are connected to both servers (hopefully based on hostname resolution) ip addresses on the same network. When you down a nic, that address is no longer available. Not only can the remote client not connect to it, but your local client cannot as well as the address no longer exists. In your real-life concern, the interconnect would not interfere with the existence of either machines' ip address so after the ping-timeout, operations would resume in a split-brain configuration. As long as no changes were made to the same file on both volumes, when the connection is reestablished, the self-heal will do exactly what you expect. However.... what you're counting on is the most common cause of split-brain. Each client connected to one server independently modifies the same file. When the connection is reestablished, the self-heal is processed and that file is marked as split-brain - inaccessible from the client mount until it's resolved by admin intervention. You can avoid the split-brain using a couple of quorum techniques, the one that would seem to satisfy your requirements leaving your volume read-only during the duration of the outage. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140306/41a34ab9/attachment.html>