On 8/25/2017 12:56 AM, Gionatan Danti wrote:> > >> WK wrote: >> 2 node plus Arbiter. You NEED the arbiter or a third node. Do NOT try 2 >> node with a VM > > This is true even if I manage locking at application level (via > virlock or sanlock)?We ran Rep2 for years on 3.4.? It does work if you are really,really? careful,? But in a crash on one side, you might have lost some bits that were on the fly. The VM would then try to heal. Without sharding, big VMs take a while because the WHOLE VM file has to be copied over. Then you might get Split-brain and have to stop the VM, pick the good one, make sure that is healed on both sides and then restart the VM. Arbiter/Replica 3 prevents that. Sharding helps a lot as well by making the heals really quick, though in a Replica 2 with sharding you no longer have a nice big? .img file sitting on each brick in plain view and picking a split-brain winner is now WAY more complicated. You would have to re-assemble things. We were quite good and fixing broken Gluster 3.4 nodes, but we are *much* happier with the Arbiter node and sharding. It is a huge difference. We could go to Rep3 but we like the extra speed and we are comfortable with the Arb limitations (we also have excellent off cluster backups <grin>).> Also, on a two-node setup it is *guaranteed* for updates to one node > to put offline the whole volume?If you still have quorum turned on, then yes. One side goes and you are down.> On the other hand, a 3-way setup (or 2+arbiter) if free from all these > problems? >Yes, you can lose one of the three nodes and after the pause, everything just continues. If you have a second failure before you can recover, then you have lost quorum. If that second failure is the other actual replica, then you could get into a situation where the arbiter isn't happy with either copy when you come back up and of course the arbiter doesn't have a good copy itself. Pavel alluded to something like that when describing his problem. That is where replica 3 helps. In theory, with replica 3, you could lose 2 nodes and still have a reasonable copy of your VM, though you've lost quorum and are still down. At that point, *I* would kill the two bad nodes (STOMITH) to prevent them from coming back AND turn off quorum. You could then run on the single node until you can save/copy those VM images, preferably by migrating off that volume completely. Create a remote pool using SSHFS if you have nothing else available. THEN I would go back and fix the gluster cluster and migrate back into it. Replica2/Replica3 does not matter if you lose your Gluster network switch, but again the Arb or Rep3 setup makes it easier to recover. I suppose the only advantage of Replica2 is that you can use a cross over cable and not worry about losing the switch, but bonding/teaming works well and there are bonding modes that don't require the same switch for the bond slaves. So you can build in some redundancy there as well.
Il 25-08-2017 21:48 WK ha scritto:> On 8/25/2017 12:56 AM, Gionatan Danti wrote: > > > We ran Rep2 for years on 3.4.? It does work if you are really,really? > careful,? But in a crash on one side, you might have lost some bits > that were on the fly. The VM would then try to heal. > Without sharding, big VMs take a while because the WHOLE VM file has > to be copied over. Then you might get Split-brain and have to stop the > VM, pick the good one, make sure that is healed on both sides and then > restart the VM.Ok, so sharding needs to be enabled for VM disk storage, otherwise heal time skyrockets.> Arbiter/Replica 3 prevents that. Sharding helps a lot as well by > making the heals really quick, though in a Replica 2 with sharding you > no longer have a nice big? .img file sitting on each brick in plain > view and picking a split-brain winner is now WAY more complicated. You > would have to re-assemble things.This concern me, and it is the reason I would like to avoid sharding. How can I recover from such a situation? How can I "decide" which (reconstructed) file is the one to keep rather than to delete?> > We were quite good and fixing broken Gluster 3.4 nodes, but we are > *much* happier with the Arbiter node and sharding. It is a huge > difference. > We could go to Rep3 but we like the extra speed and we are comfortable > with the Arb limitations (we also have excellent off cluster backups > <grin>). > > >> Also, on a two-node setup it is *guaranteed* for updates to one node >> to put offline the whole volume? > > If you still have quorum turned on, then yes. One side goes and you are > down. > >> On the other hand, a 3-way setup (or 2+arbiter) if free from all these >> problems? >> > > Yes, you can lose one of the three nodes and after the pause, > everything just continues. If you have a second failure before you can > recover, then you have lost quorum. > > If that second failure is the other actual replica, then you could get > into a situation where the arbiter isn't happy with either copy when > you come back up and of course the arbiter doesn't have a good copy > itself. Pavel alluded to something like that when describing his > problem. > > That is where replica 3 helps. In theory, with replica 3, you could > lose 2 nodes and still have a reasonable copy of your VM, though > you've lost quorum and are still down. At that point, *I* would kill > the two bad nodes (STOMITH) to prevent them from coming back AND turn > off quorum. You could then run on the single node until you can > save/copy those VM images, preferably by migrating off that volume > completely. Create a remote pool using SSHFS if you have nothing else > available. THEN I would go back and fix the gluster cluster and > migrate back into it. > > Replica2/Replica3 does not matter if you lose your Gluster network > switch, but again the Arb or Rep3 setup makes it easier to recover. I > suppose the only advantage of Replica2 is that you can use a cross > over cable and not worry about losing the switch, but bonding/teaming > works well and there are bonding modes that don't require the same > switch for the bond slaves. So you can build in some redundancy there > as well.Thank you for the very valuable informations. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti at assyoma.it - info at assyoma.it GPG public key ID: FF5F32A8
lemonnierk at ulrar.net
2017-Aug-25 21:21 UTC
[Gluster-users] GlusterFS as virtual machine storage
> > This concern me, and it is the reason I would like to avoid sharding. > How can I recover from such a situation? How can I "decide" which > (reconstructed) file is the one to keep rather than to delete? >No need, on a replica 3 that just doesn't happen. That's the main advantage of it, that and the fact that you can perform operations on your servers without having the volume go down. For a replica 2 though, it will happen. With or without sharding the operation is the same, it involves fiddling with gfids and is a bit annoying, but not that hard for one file. But with sharding enabled you'll need to pick each split brained shard out, which is I imagine a huge pain .. Again, just don't do 2 nodes, it's a _bad_ idea. Add at the very least an arbiter. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Digital signature URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170825/1ab76749/attachment.sig>