thr3ads.net - Gluster users - [Gluster-users] Virt-store use case - HA failure issue

If this information is useful, please help other people find it:
Share via:

Vince Loschiavo

2014-Jul-31 16:22 UTC

[Gluster-users] Virt-store use case - HA failure issue - suggestions needed

I'm currently testing Gluster 3.5.1 in a two server QEMU/KVM environment.
Centos 6.5:
Two servers (KVM07 & KVM08), Two brick (one brick per server) replicated
volume

I've tuned the volume per the documentation here:
http://gluster.org/documentation/use_cases/Virt-store-usecase/

I have the gluster volume fuse mounted on KVM07 and KVM08 and am using it
to store raw disk images.

KVM is using the fuse mounted volume as a "dir: Filesystem Directory:
storage pool.

With setting dynamic_ownership = 0 in /etc/libvirt/qemu.conf and chown-ing
the files to qemu:qemu, live migration works great.

Problem:
If I need to take down one of these servers for maintenance, I live migrate
the VMs to the other server.
service gluster stop
then kill all the remaining gluster and brick processes.

At this point, the VMs die. The Fuse mount recovers and remains attached
to the volume via the other server, but the VIRT disk images are not fully
synced.

This causes the VMs to go into a read-only files system state, then kernel
panic. Reboots/restarts of the VMs just cause kernel panics. This
effectively brings down the two node cluster.

Bringing back up the gluster node / bricks /etc, prompts a self-heal. Once
self-heal is completed, the VMs can boot normally.

Question: is there a better way to accomplish HA with live/running Virt
images? The goal is to be able to bring down any one server in the pair
and perform maintenance without interrupting the VMs.

I assume my shutdown process is flawed but haven't been able to find a
better process.

Any suggestions are welcome.

--
-Vince Loschiavo
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140731/471d1439/attachment.html>

Jason Brooks

2014-Jul-31 18:52 UTC

head link

[Gluster-users] Virt-store use case - HA failure issue - suggestions needed

----- Original Message -----> From: "Vince Loschiavo" <vloschiavo at gmail.com>
> To: gluster-users at gluster.org
> Sent: Thursday, July 31, 2014 9:22:16 AM
> Subject: [Gluster-users] Virt-store use case - HA failure issue -
suggestions needed
> 
> I'm currently testing Gluster 3.5.1 in a two server QEMU/KVM
environment.
> Centos 6.5:
> Two servers (KVM07 & KVM08), Two brick (one brick per server)
replicated
> volume
> 
> I've tuned the volume per the documentation here:
> http://gluster.org/documentation/use_cases/Virt-store-usecase/
> 
> I have the gluster volume fuse mounted on KVM07 and KVM08 and am using it
> to store raw disk images.
> 
> KVM is using the fuse mounted volume as a "dir: Filesystem Directory:
> storage pool.
> 
> With setting dynamic_ownership = 0 in /etc/libvirt/qemu.conf and chown-ing
> the files to qemu:qemu, live migration works great.
> 
> Problem:
> If I need to take down one of these servers for maintenance, I live migrate
> the VMs to the other server.
> service gluster stop
> then kill all the remaining gluster and brick processes.
The guide says that quorum-type=auto sets a rule such that at least half 
of the bricks in the replica group should be UP and running. If not, 
the replica group becomes read-only. I think the rule is actually 51%, 
so bringing down one of the two servers makes your volume read-only.

If you want two servers, you need to unset this rule. Better to add a
third server and a third replica, though.

Regards, Jason

> 
> At this point, the VMs die.  The Fuse mount recovers and remains attached
> to the volume via the other server, but the VIRT disk images are not fully
> synced.
> 
> This causes the VMs to go into a read-only files system state, then kernel
> panic.  Reboots/restarts of the VMs just cause kernel panics.  This
> effectively brings down the two node cluster.
> 
> Bringing back up the gluster node / bricks /etc, prompts a self-heal.  Once
> self-heal is completed, the VMs can boot normally.
> 
> Question: is there a better way to accomplish HA with live/running Virt
> images?  The goal is to be able to bring down any one server in the pair
> and perform maintenance without interrupting the VMs.
> 
> I assume my shutdown process is flawed but haven't been able to find a
> better process.
> 
> Any suggestions are welcome.
> 
> 
> --
> -Vince Loschiavo
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

Gluster users - Jul 2014 - Virt-store use case - HA failure issue - suggestions needed

[Gluster-users] Virt-store use case - HA failure issue - suggestions needed

[Gluster-users] Virt-store use case - HA failure issue - suggestions needed