thr3ads.net - Gluster users - [Gluster-users] Expected behaviour of hypervisor on Gluster node loss [Jan 2017]

If this information is useful, please help other people find it:
Share via:

Niklaus Hofer

2017-Jan-30 11:25 UTC

[Gluster-users] Expected behaviour of hypervisor on Gluster node loss

Hi

I have a question concerning the 'correct' behaviour of GlusterFS:

We a nice Gluster setup up and running. Most things are working nicely. 
Our setup is as follows:
  - Storage is a 2+1 Gluster setup (2 replicating hosts + 1 arbiter) 
with a volume for virtual machines.
  - Two virtualisation hosts running libvirt / qemu / kvm.

Now the question is, what is supposed to happen when we unplug one of 
the storage nodes (aka power outage in one of our data centers)?
Initially we were hoping that the virtualisation hosts would 
automatically switch over to the second storage node and keep all VMs 
running.

However, during our tests, we have found that this is not the case. 
Instead, when we unplug one of the storage nodes, the virtual machines 
run into all sorts of problems; being unable to read/write, crashing 
applications and even corrupting the filesystem. That is of course not 
acceptable.

Reading the documentation again, we now think that we have misunderstood 
what we're supposed to be doing. To our understanding, what should 
happen is this:
  - If the virtualisation host is connected to the storage node which is 
still running:
    - everything is fine and the VM keeps running
  - If the virtualisation host was connected to the storage node which 
is now absent:
    - qemu is supposed to 'pause' / 'freeze' the VM
    - Virtualisation host waits for ping timeout
    - Virtualisation host switches over to the other storage node
    - qemu 'unpauses' the VMs
    - The VM is fully operational again

Does my description match the 'optimal' GlusterFS behaviour?


Greets
Niklaus Hofer
-- 
stepping stone GmbH
Neufeldstrasse 9
CH-3012 Bern

Telefon: +41 31 332 53 63
www.stepping-stone.ch
niklaus.hofer at stepping-stone.ch

Vijay Bellur

2017-Feb-01 04:35 UTC

head link

[Gluster-users] Expected behaviour of hypervisor on Gluster node loss

On Mon, Jan 30, 2017 at 6:25 AM, Niklaus Hofer <
niklaus.hofer at stepping-stone.ch> wrote:
> Hi
>
> I have a question concerning the 'correct' behaviour of GlusterFS:
>
> We a nice Gluster setup up and running. Most things are working nicely.
> Our setup is as follows:
>  - Storage is a 2+1 Gluster setup (2 replicating hosts + 1 arbiter) with a
> volume for virtual machines.
>  - Two virtualisation hosts running libvirt / qemu / kvm.
>
>Are you using something like oVirt or proxmox for managing your
virtualization cluster?

> Now the question is, what is supposed to happen when we unplug one of the
> storage nodes (aka power outage in one of our data centers)?
> Initially we were hoping that the virtualisation hosts would automatically
> switch over to the second storage node and keep all VMs running.
>
> However, during our tests, we have found that this is not the case.
> Instead, when we unplug one of the storage nodes, the virtual machines run
> into all sorts of problems; being unable to read/write, crashing
> applications and even corrupting the filesystem. That is of course not
> acceptable.
>
> Reading the documentation again, we now think that we have misunderstood
> what we're supposed to be doing. To our understanding, what should
happen
> is this:
>  - If the virtualisation host is connected to the storage node which is
> still running:
>    - everything is fine and the VM keeps running
>  - If the virtualisation host was connected to the storage node which is
> now absent:
>    - qemu is supposed to 'pause' / 'freeze' the VM
>    - Virtualisation host waits for ping timeout
>    - Virtualisation host switches over to the other storage node
>    - qemu 'unpauses' the VMs
>    - The VM is fully operational again
>
> Does my description match the 'optimal' GlusterFS behaviour?
>

Can you provide more details about your gluster volume configuration and
the options enabled on the volume?

Regards,
Vijay
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170131/13d7c069/attachment.html>

Gluster users - Jan 2017 - Expected behaviour of hypervisor on Gluster node loss

[Gluster-users] Expected behaviour of hypervisor on Gluster node loss

[Gluster-users] Expected behaviour of hypervisor on Gluster node loss