Vince Loschiavo
2014-Jul-31 16:22 UTC
[Gluster-users] Virt-store use case - HA failure issue - suggestions needed
I'm currently testing Gluster 3.5.1 in a two server QEMU/KVM environment. Centos 6.5: Two servers (KVM07 & KVM08), Two brick (one brick per server) replicated volume I've tuned the volume per the documentation here: http://gluster.org/documentation/use_cases/Virt-store-usecase/ I have the gluster volume fuse mounted on KVM07 and KVM08 and am using it to store raw disk images. KVM is using the fuse mounted volume as a "dir: Filesystem Directory: storage pool. With setting dynamic_ownership = 0 in /etc/libvirt/qemu.conf and chown-ing the files to qemu:qemu, live migration works great. Problem: If I need to take down one of these servers for maintenance, I live migrate the VMs to the other server. service gluster stop then kill all the remaining gluster and brick processes. At this point, the VMs die. The Fuse mount recovers and remains attached to the volume via the other server, but the VIRT disk images are not fully synced. This causes the VMs to go into a read-only files system state, then kernel panic. Reboots/restarts of the VMs just cause kernel panics. This effectively brings down the two node cluster. Bringing back up the gluster node / bricks /etc, prompts a self-heal. Once self-heal is completed, the VMs can boot normally. Question: is there a better way to accomplish HA with live/running Virt images? The goal is to be able to bring down any one server in the pair and perform maintenance without interrupting the VMs. I assume my shutdown process is flawed but haven't been able to find a better process. Any suggestions are welcome. -- -Vince Loschiavo -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140731/471d1439/attachment.html>
Jason Brooks
2014-Jul-31 18:52 UTC
[Gluster-users] Virt-store use case - HA failure issue - suggestions needed
----- Original Message -----> From: "Vince Loschiavo" <vloschiavo at gmail.com> > To: gluster-users at gluster.org > Sent: Thursday, July 31, 2014 9:22:16 AM > Subject: [Gluster-users] Virt-store use case - HA failure issue - suggestions needed > > I'm currently testing Gluster 3.5.1 in a two server QEMU/KVM environment. > Centos 6.5: > Two servers (KVM07 & KVM08), Two brick (one brick per server) replicated > volume > > I've tuned the volume per the documentation here: > http://gluster.org/documentation/use_cases/Virt-store-usecase/ > > I have the gluster volume fuse mounted on KVM07 and KVM08 and am using it > to store raw disk images. > > KVM is using the fuse mounted volume as a "dir: Filesystem Directory: > storage pool. > > With setting dynamic_ownership = 0 in /etc/libvirt/qemu.conf and chown-ing > the files to qemu:qemu, live migration works great. > > Problem: > If I need to take down one of these servers for maintenance, I live migrate > the VMs to the other server. > service gluster stop > then kill all the remaining gluster and brick processes.The guide says that quorum-type=auto sets a rule such that at least half of the bricks in the replica group should be UP and running. If not, the replica group becomes read-only. I think the rule is actually 51%, so bringing down one of the two servers makes your volume read-only. If you want two servers, you need to unset this rule. Better to add a third server and a third replica, though. Regards, Jason> > At this point, the VMs die. The Fuse mount recovers and remains attached > to the volume via the other server, but the VIRT disk images are not fully > synced. > > This causes the VMs to go into a read-only files system state, then kernel > panic. Reboots/restarts of the VMs just cause kernel panics. This > effectively brings down the two node cluster. > > Bringing back up the gluster node / bricks /etc, prompts a self-heal. Once > self-heal is completed, the VMs can boot normally. > > Question: is there a better way to accomplish HA with live/running Virt > images? The goal is to be able to bring down any one server in the pair > and perform maintenance without interrupting the VMs. > > I assume my shutdown process is flawed but haven't been able to find a > better process. > > Any suggestions are welcome. > > > -- > -Vince Loschiavo > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users