Ravishankar N
2017-May-29  04:24 UTC
[Gluster-users] Recovering from Arb/Quorum Write Locks
On 05/29/2017 03:36 AM, W Kern wrote:> So I have testbed composed of a simple 2+1 replicate 3 with ARB testbed. > > gluster1, gluster2 and gluster-arb (with shards) > > My testing involves some libvirt VMs running continuous write fops on > a localhost fuse mount on gluster1 > > Works great when all the pieces are there. Once I figured out the > shard tuning, I was really happy with the speed, even with the older > kit I was using for the testbed. Sharding is a huge win. > > So for Failure testing I found the following: > > If you take down the ARB, the VMs continue to run perfectly and when > the ARB returns it catches up. > > However, if you take down Gluster2 (with the ARB still being up) you > often (but not always) get a write lock on one or more of the VMs, > until Gluster2 recovers and heals. > > Per the Docs, this Write Lock is evidently EXPECTED behavior with an > Arbiter to avoid a Split-Brain.This happens only if gluster2 had previously witnessed some writes that the gluster1 hadn't.> > As I understand it, if the Arb thinks that it knows about (and agrees > with) data that exists on Gluster2 (now down) that should be written > to Gluster1, it will write lock the volume because the ARB itself > doesn't have that data and going forward is problematic until > Gluster2's data is back in the cluster and can bring the volume back > into proper sync.Just to elaborate further, if all nodes were up to begin with and there were zero self-heals pending, and you only brought down only gluster2, writes must still be allowed. I guess in your case, there must be some pending heals from gluster2 to gluster1 before you brought gluster2 down due to a network disconnect from the fuse mount to gluster1.> > OK, that is the reality of using an Rep2 + ARB versus a true Rep3 > environment. You get Split-Brain protection but not much increase in > HA over old school Replica 2. > > So I have some questions: > > a) In the event that gluster2 had died and we have entered this write > lock phase, how does one go forward if the Gluster2 outage can't be > immediately (or remotely) resolved? > > At that point I have some hung VMs and annoyed users. > > The current quorum settings are: > > # gluster volume get VOL all | grep 'quorum' > cluster.quorum-type auto > cluster.quorum-count 2 > cluster.server-quorum-type server > cluster.server-quorum-ratio 0 > cluster.quorum-reads no > > Do I simply kill the quorum and and the VMs will continue where they > left off? > > gluster volume set VOL cluster.server-quorum-type none > gluster volume set VOL cluster.quorum-type none > > If I do so, should I also kill the ARB (before or after)? or leave it up > > Or should I switch to quorum-type fixed with a quorum count of 1? >All of this is not recommended because you would risk getting the files into split-brain.> b) If I WANT to take down Gluster2 for maintenance, how do I prevent > the quorum write-lock from occurring. > > I suppose I could fiddle with the quorum settings as above, but I'd > like to be able to PAUSE/FLUSH/FSYNC the Volume before taking down > Gluster2, then unpause and let the volume continue with Gluster1 and > the ARB providing some sort of protection and to help when Gluster2 is > returned to the cluster. >I think you should try to find if there were self-heals pending to gluster1 before you brought gluster2 down or the VMs should not have paused.> c) Does any of the above behaviour change when I switch to GFAPIIt shouldn't. Thanks, Ravi> > Sincerely > > -bill > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users
On 5/28/2017 9:24 PM, Ravishankar N wrote:> Just to elaborate further, if all nodes were up to begin with and > there were zero self-heals pending, and you only brought down only > gluster2, writes must still be allowed. I guess in your case, there > must be some pending heals from gluster2 to gluster1 before you > brought gluster2 down due to a network disconnect from the fuse mount > to gluster1. >OK, I was aggressively writing within and to those VMs all at the same time pulling cables (power and network). My initial observation was that the shards healed quickly, but perhaps that I may have gotten too aggressive didn't wait long enough between tests for the healing to kick-in and/or finish. I will retest and pay attention to outstanding heals, both prior and during the tests.>> I suppose I could fiddle with the quorum settings as above, but I'd >> like to be able to PAUSE/FLUSH/FSYNC the Volume before taking down >> Gluster2, then unpause and let the volume continue with Gluster1 and >> the ARB providing some sort of protection and to help when Gluster2 >> is returned to the cluster. >> > > I think you should try to find if there were self-heals pending to > gluster1 before you brought gluster2 down or the VMs should not have > paused.yes, I'll start look at heals PRIOR to yanking cables. OK, can I assume SOME pause is expected when Gluster first sees gluster2 go down which would unpause after a timeout period. I have seen that behaviour as well. -bill
On 5/28/2017 9:24 PM, Ravishankar N wrote:> I think you should try to find if there were self-heals pending to > gluster1 before you brought gluster2 down or the VMs should not have > paused.yes, if I watch for and then force outstanding heals (if the self-heal hasn't kicked in) prior to shutting down a node, the write lock does not occur. I only get the 'timeout' pause. So I had no problem. I had an aggressive write and failure sequence and misunderstood how and when heals occur. So all is good. -bill