thr3ads.net - Gluster users - [Gluster-users] GlusterFS as virtual machine storage [Aug 2017]

If this information is useful, please help other people find it:
Share via:

2017-Aug-25 19:48 UTC

[Gluster-users] GlusterFS as virtual machine storage

On 8/25/2017 12:56 AM, Gionatan Danti wrote:>
>
>> WK wrote:
>> 2 node plus Arbiter. You NEED the arbiter or a third node. Do NOT try 2
>> node with a VM
>
> This is true even if I manage locking at application level (via 
> virlock or sanlock)?

We ran Rep2 for years on 3.4.? It does work if you are really,really? 
careful,? But in a crash on one side, you might have lost some bits that 
were on the fly. The VM would then try to heal.
Without sharding, big VMs take a while because the WHOLE VM file has to 
be copied over. Then you might get Split-brain and have to stop the VM, 
pick the good one, make sure that is healed on both sides and then 
restart the VM.

Arbiter/Replica 3 prevents that. Sharding helps a lot as well by making 
the heals really quick, though in a Replica 2 with sharding you no 
longer have a nice big? .img file sitting on each brick in plain view 
and picking a split-brain winner is now WAY more complicated. You would 
have to re-assemble things.

We were quite good and fixing broken Gluster 3.4 nodes, but we are 
*much* happier with the Arbiter node and sharding. It is a huge difference.
We could go to Rep3 but we like the extra speed and we are comfortable 
with the Arb limitations (we also have excellent off cluster backups 
<grin>).

> Also, on a two-node setup it is *guaranteed* for updates to one node 
> to put offline the whole volume?
If you still have quorum turned on, then yes. One side goes and you are 
down.
> On the other hand, a 3-way setup (or 2+arbiter) if free from all these 
> problems?
>
Yes, you can lose one of the three nodes and after the pause, everything 
just continues. If you have a second failure before you can recover, 
then you have lost quorum.

If that second failure is the other actual replica, then you could get 
into a situation where the arbiter isn't happy with either copy when you 
come back up and of course the arbiter doesn't have a good copy itself. 
Pavel alluded to something like that when describing his problem.

That is where replica 3 helps. In theory, with replica 3, you could lose 
2 nodes and still have a reasonable copy of your VM, though you've lost 
quorum and are still down. At that point, *I* would kill the two bad 
nodes (STOMITH) to prevent them from coming back AND turn off quorum. 
You could then run on the single node until you can save/copy those VM 
images, preferably by migrating off that volume completely. Create a 
remote pool using SSHFS if you have nothing else available. THEN I would 
go back and fix the gluster cluster and migrate back into it.

Replica2/Replica3 does not matter if you lose your Gluster network 
switch, but again the Arb or Rep3 setup makes it easier to recover. I 
suppose the only advantage of Replica2 is that you can use a cross over 
cable and not worry about losing the switch, but bonding/teaming works 
well and there are bonding modes that don't require the same switch for 
the bond slaves. So you can build in some redundancy there as well.

Gionatan Danti

2017-Aug-25 21:08 UTC

head link

[Gluster-users] GlusterFS as virtual machine storage

Il 25-08-2017 21:48 WK ha scritto:> On 8/25/2017 12:56 AM, Gionatan Danti wrote:
> 
> 
> We ran Rep2 for years on 3.4.? It does work if you are really,really?
> careful,? But in a crash on one side, you might have lost some bits
> that were on the fly. The VM would then try to heal.
> Without sharding, big VMs take a while because the WHOLE VM file has
> to be copied over. Then you might get Split-brain and have to stop the
> VM, pick the good one, make sure that is healed on both sides and then
> restart the VM.
Ok, so sharding needs to be enabled for VM disk storage, otherwise heal 
time skyrockets.
> Arbiter/Replica 3 prevents that. Sharding helps a lot as well by
> making the heals really quick, though in a Replica 2 with sharding you
> no longer have a nice big? .img file sitting on each brick in plain
> view and picking a split-brain winner is now WAY more complicated. You
> would have to re-assemble things.
This concern me, and it is the reason I would like to avoid sharding. 
How can I recover from such a situation? How can I "decide" which 
(reconstructed) file is the one to keep rather than to delete?
> 
> We were quite good and fixing broken Gluster 3.4 nodes, but we are
> *much* happier with the Arbiter node and sharding. It is a huge
> difference.
> We could go to Rep3 but we like the extra speed and we are comfortable
> with the Arb limitations (we also have excellent off cluster backups
> <grin>).
> 
> 
>> Also, on a two-node setup it is *guaranteed* for updates to one node 
>> to put offline the whole volume?
> 
> If you still have quorum turned on, then yes. One side goes and you are 
> down.
> 
>> On the other hand, a 3-way setup (or 2+arbiter) if free from all these 
>> problems?
>> 
> 
> Yes, you can lose one of the three nodes and after the pause,
> everything just continues. If you have a second failure before you can
> recover, then you have lost quorum.
> 
> If that second failure is the other actual replica, then you could get
> into a situation where the arbiter isn't happy with either copy when
> you come back up and of course the arbiter doesn't have a good copy
> itself. Pavel alluded to something like that when describing his
> problem.
> 
> That is where replica 3 helps. In theory, with replica 3, you could
> lose 2 nodes and still have a reasonable copy of your VM, though
> you've lost quorum and are still down. At that point, *I* would kill
> the two bad nodes (STOMITH) to prevent them from coming back AND turn
> off quorum. You could then run on the single node until you can
> save/copy those VM images, preferably by migrating off that volume
> completely. Create a remote pool using SSHFS if you have nothing else
> available. THEN I would go back and fix the gluster cluster and
> migrate back into it.
> 
> Replica2/Replica3 does not matter if you lose your Gluster network
> switch, but again the Arb or Rep3 setup makes it easier to recover. I
> suppose the only advantage of Replica2 is that you can use a cross
> over cable and not worry about losing the switch, but bonding/teaming
> works well and there are bonding modes that don't require the same
> switch for the bond slaves. So you can build in some redundancy there
> as well.
Thank you for the very valuable informations.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti at assyoma.it - info at assyoma.it
GPG public key ID: FF5F32A8

lemonnierk at ulrar.net

2017-Aug-25 21:21 UTC

head link

[Gluster-users] GlusterFS as virtual machine storage

> 
> This concern me, and it is the reason I would like to avoid sharding. 
> How can I recover from such a situation? How can I "decide" which
> (reconstructed) file is the one to keep rather than to delete?
> 
No need, on a replica 3 that just doesn't happen. That's the main
advantage of it, that and the fact that you can perform operations on
your servers without having the volume go down.

For a replica 2 though, it will happen. With or without sharding the
operation is the same, it involves fiddling with gfids and is a bit
annoying, but not that hard for one file. But with sharding enabled
you'll need to pick each split brained shard out, which is I imagine a
huge pain .. Again, just don't do 2 nodes, it's a _bad_ idea. Add at the
very least an arbiter.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Digital signature
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170825/1ab76749/attachment.sig>

Apparently Analagous Threads

Search for more maybe matching threads

Gluster users - Aug 2017 - GlusterFS as virtual machine storage

[Gluster-users] GlusterFS as virtual machine storage

[Gluster-users] GlusterFS as virtual machine storage

[Gluster-users] GlusterFS as virtual machine storage

Apparently Analagous Threads