thr3ads.net - Gluster users - [Gluster-users] GlusterFS HA testing feedback [Oct 2013]

If this information is useful, please help other people find it:
Share via:

José A. Lausuch Sales

2013-Oct-22 09:42 UTC

[Gluster-users] GlusterFS HA testing feedback

Hi,

we are currently evaluating GlusterFS for a production environment. Our
focus is on the high-availability features of GlusterFS. However, our tests
have not worked out well. Hence I am seeking feedback from you.


In our planned production environment, Gluster should provide shared
storage for VM disk images. So, our very basic initial test setup is as
follows:


We are using two servers, each providing a single brick of a replicated
gluster volume (Gluster 3.4.1). A third server runs a test-VM (Ubuntu 13.04
on QEMU 1.3.0 and libvirt 1.0.3) which uses a disk image file stored on the
gluster volume as block device (/dev/vdb). For testing purposes, the root
file system of this VM (/dev/vda) is a disk image NOT stored on the gluster
volume.


To test the high-availability features of gluster under load, we run FIO
inside the VM directly on the vdb block device (see configuration below).
Up to now, we tested reading only. The test procedure is as follows:

1. We start FIO inside the VM and observe by means of "top" which of
the
two servers receives the read requests (i.e., increased CPU load of the
glusterd process). Let?s say that Server1 has the CPU load by glusterfsd.

2. While FIO is running, we take down the network of this Server1 and
observe if the Server2 takes over.

3. This ?fail over? works (almost 100% of the times), we see the CPU load
from glusterfsd on Server2. As expected, Server1 does not have any load
because is ?offline?.

4. After a while we bring up the NIC on Server1 again. In this step we
realized that the expected behavior is that when bringing up this NIC, this
server should take over again (something like active-passive behavior) but
this happens only 5-10% of the times.  The CPU load is still on Server2.

5. After some time, we bring down the NIC on Server2 expecting that Server1
takes over.  This second "fail over" crashes. The VM complains about
I/O
errors which can only be resolved by restarting the VM and sometimes even
removing and creating the volume again.


After some test, we realized that if restarting the glusterd daemon
(/etc/init.d/glusterd restart) on Server1 after step 3 or before step 4,
the Server1 takes over automatically without bringing down Server2 or
anything like that.


We tested this using the normal FUSE mount and libgfapi. If using FUSE, the
local mount sometimes becomes unavailable (ls shows not more files) if the
failover fails.


We have a few fundamental questions in this regard:

i) Is Gluster supposed to handle such a scenario or are we making wrong
assumptions? Because the only solution we found is to restart the daemon
when a network outage occurs, but this is not acceptable in a real scenario
with VMs running real applications.

ii) What is the recommended configuration in terms of caching (QEMU:
cache=none/writethrough/writeback) and direct I/O (FIO and Gluster) to
maximize the reliability of the failover process? We varied the parameters
but could find a working configuration. Do these parameters have an impact
at all?




FIO test specification:

[global]
direct=1
ioengine=libaio
iodepth=4
filename=/dev/vdb
runtime=300
numjobs=1

[maxthroughput]
rw=read
bs=16k



VM configuration:

<domain type='kvm' id='6'>
  <name>testvm</name>
  <uuid>93877c03-605b-ed67-1ab2-2ba16b5fb6b5</uuid>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <vcpu placement='static'>1</vcpu>
  <os>
    <type arch='x86_64' machine='pc-1.1'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/bin/kvm</emulator>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw'
cache='writethrough'/>
      <source dev='/mnt/local/io-perf.img'/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00'
slot='0x04'
function='0x0'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw'
cache='writethrough'/>
      <source dev='/mnt/shared/io-perf-testdisk.img'/>
      <target dev='vdb' bus='virtio'/>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x00'
slot='0x07'
function='0x0'/>
    </disk>
    <controller type='usb' index='0'>
      <alias name='usb0'/>
      <address type='pci' domain='0x0000' bus='0x00'
slot='0x01'
function='0x2'/>
    </controller>
    <interface type='network'>
      <mac address='52:54:00:36:5f:dd'/>
      <source network='default'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00'
slot='0x03'
function='0x0'/>
    </interface>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='5900' autoport='yes'
listen='127.0.0.1'>
      <listen type='address' address='127.0.0.1'/>
    </graphics>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00'
slot='0x02'
function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00'
slot='0x05'
function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='none'/>
</domain>




Thank you very much in advance,
Jose Lausuch
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131022/76b9efd3/attachment.html>

Bryan Whitehead

2013-Oct-22 18:04 UTC

head link

[Gluster-users] GlusterFS HA testing feedback

So gluster is just running on 10Mbit nic cards or 56Gbit Infiniband?


With 1G nic cards, assuming only replica=2, you are looking at pretty
limited IO for gluster to work with. That can cause long pauses and other
timeouts in my experience.


On Tue, Oct 22, 2013 at 2:42 AM, Jos? A. Lausuch Sales
<jlausuch at gmail.com>wrote:
> Hi,
>
> we are currently evaluating GlusterFS for a production environment. Our
> focus is on the high-availability features of GlusterFS. However, our tests
> have not worked out well. Hence I am seeking feedback from you.
>
>
> In our planned production environment, Gluster should provide shared
> storage for VM disk images. So, our very basic initial test setup is as
> follows:
>
>
> We are using two servers, each providing a single brick of a replicated
> gluster volume (Gluster 3.4.1). A third server runs a test-VM (Ubuntu 13.04
> on QEMU 1.3.0 and libvirt 1.0.3) which uses a disk image file stored on the
> gluster volume as block device (/dev/vdb). For testing purposes, the root
> file system of this VM (/dev/vda) is a disk image NOT stored on the gluster
> volume.
>
>
> To test the high-availability features of gluster under load, we run FIO
> inside the VM directly on the vdb block device (see configuration below).
> Up to now, we tested reading only. The test procedure is as follows:
>
> 1. We start FIO inside the VM and observe by means of "top" which
of the
> two servers receives the read requests (i.e., increased CPU load of the
> glusterd process). Let?s say that Server1 has the CPU load by glusterfsd.
>
> 2. While FIO is running, we take down the network of this Server1 and
> observe if the Server2 takes over.
>
> 3. This ?fail over? works (almost 100% of the times), we see the CPU load
> from glusterfsd on Server2. As expected, Server1 does not have any load
> because is ?offline?.
>
> 4. After a while we bring up the NIC on Server1 again. In this step we
> realized that the expected behavior is that when bringing up this NIC, this
> server should take over again (something like active-passive behavior) but
> this happens only 5-10% of the times.  The CPU load is still on Server2.
>
> 5. After some time, we bring down the NIC on Server2 expecting that
> Server1 takes over.  This second "fail over" crashes. The VM
complains
> about I/O errors which can only be resolved by restarting the VM and
> sometimes even removing and creating the volume again.
>
>
> After some test, we realized that if restarting the glusterd daemon
> (/etc/init.d/glusterd restart) on Server1 after step 3 or before step 4,
> the Server1 takes over automatically without bringing down Server2 or
> anything like that.
>
>
> We tested this using the normal FUSE mount and libgfapi. If using FUSE,
> the local mount sometimes becomes unavailable (ls shows not more files) if
> the failover fails.
>
>
> We have a few fundamental questions in this regard:
>
> i) Is Gluster supposed to handle such a scenario or are we making wrong
> assumptions? Because the only solution we found is to restart the daemon
> when a network outage occurs, but this is not acceptable in a real scenario
> with VMs running real applications.
>
> ii) What is the recommended configuration in terms of caching (QEMU:
> cache=none/writethrough/writeback) and direct I/O (FIO and Gluster) to
> maximize the reliability of the failover process? We varied the parameters
> but could find a working configuration. Do these parameters have an impact
> at all?
>
>
>
>
> FIO test specification:
>
> [global]
> direct=1
> ioengine=libaio
> iodepth=4
> filename=/dev/vdb
> runtime=300
> numjobs=1
>
> [maxthroughput]
> rw=read
> bs=16k
>
>
>
> VM configuration:
>
> <domain type='kvm' id='6'>
>   <name>testvm</name>
>   <uuid>93877c03-605b-ed67-1ab2-2ba16b5fb6b5</uuid>
>   <memory unit='KiB'>2097152</memory>
>   <currentMemory unit='KiB'>2097152</currentMemory>
>   <vcpu placement='static'>1</vcpu>
>   <os>
>     <type arch='x86_64'
machine='pc-1.1'>hvm</type>
>     <boot dev='hd'/>
>   </os>
>   <features>
>     <acpi/>
>     <apic/>
>     <pae/>
>   </features>
>   <clock offset='utc'/>
>   <on_poweroff>destroy</on_poweroff>
>   <on_reboot>restart</on_reboot>
>   <on_crash>restart</on_crash>
>   <devices>
>     <emulator>/usr/bin/kvm</emulator>
>     <disk type='block' device='disk'>
>       <driver name='qemu' type='raw'
cache='writethrough'/>
>       <source dev='/mnt/local/io-perf.img'/>
>       <target dev='vda' bus='virtio'/>
>       <alias name='virtio-disk0'/>
>       <address type='pci' domain='0x0000'
bus='0x00' slot='0x04'
> function='0x0'/>
>     </disk>
>     <disk type='block' device='disk'>
>       <driver name='qemu' type='raw'
cache='writethrough'/>
>       <source dev='/mnt/shared/io-perf-testdisk.img'/>
>       <target dev='vdb' bus='virtio'/>
>       <alias name='virtio-disk1'/>
>       <address type='pci' domain='0x0000'
bus='0x00' slot='0x07'
> function='0x0'/>
>     </disk>
>     <controller type='usb' index='0'>
>       <alias name='usb0'/>
>       <address type='pci' domain='0x0000'
bus='0x00' slot='0x01'
> function='0x2'/>
>     </controller>
>     <interface type='network'>
>       <mac address='52:54:00:36:5f:dd'/>
>       <source network='default'/>
>       <target dev='vnet0'/>
>       <model type='virtio'/>
>       <alias name='net0'/>
>       <address type='pci' domain='0x0000'
bus='0x00' slot='0x03'
> function='0x0'/>
>     </interface>
>     <input type='mouse' bus='ps2'/>
>     <graphics type='vnc' port='5900'
autoport='yes' listen='127.0.0.1'>
>       <listen type='address' address='127.0.0.1'/>
>     </graphics>
>     <video>
>       <model type='cirrus' vram='9216'
heads='1'/>
>       <alias name='video0'/>
>       <address type='pci' domain='0x0000'
bus='0x00' slot='0x02'
> function='0x0'/>
>     </video>
>     <memballoon model='virtio'>
>       <alias name='balloon0'/>
>       <address type='pci' domain='0x0000'
bus='0x00' slot='0x05'
> function='0x0'/>
>     </memballoon>
>   </devices>
>   <seclabel type='none'/>
> </domain>
>
>
>
>
> Thank you very much in advance,
> Jose Lausuch
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131022/5191d20f/attachment.html>

Joe Julian

2013-Oct-22 20:13 UTC

head link

[Gluster-users] GlusterFS HA testing feedback

On 10/22/2013 02:42 AM, Jos? A. Lausuch Sales wrote:> Hi,
>
> we are currently evaluating GlusterFS for a production environment. 
> Our focus is on the high-availability features of GlusterFS. However, 
> our tests have not worked out well. Hence I am seeking feedback from you.
>
>
> In our planned production environment, Gluster should provide shared 
> storage for VM disk images. So, our very basic initial test setup is 
> as follows:
>
>
> We are using two servers, each providing a single brick of a 
> replicated gluster volume (Gluster 3.4.1). A third server runs a 
> test-VM (Ubuntu 13.04 on QEMU 1.3.0 and libvirt 1.0.3) which uses a 
> disk image file stored on the gluster volume as block device 
> (/dev/vdb). For testing purposes, the root file system of this VM 
> (/dev/vda) is a disk image NOT stored on the gluster volume.
>
>
> To test the high-availability features of gluster under load, we run 
> FIO inside the VM directly on the vdb block device (see configuration 
> below). Up to now, we tested reading only. The test procedure is as 
> follows:
>
> 1.We start FIO inside the VM and observe by means of "top" which
of
> the two servers receives the read requests (i.e., increased CPU load 
> of the glusterd process). Let's say that Server1 has the CPU load by 
> glusterfsd.
>
> 2.While FIO is running, we take down the network of this Server1 and 
> observe if the Server2 takes over.
You're bringing server1 down by taking down the NIC (assuming from #5). 
This does take down the connection but it does so without closing the 
TCP connection. Though this does represent worst-case scenarios, see 
http://joejulian.name/blog/keeping-your-vms-from-going-read-only-when-encountering-a-ping-timeout-in-glusterfs/
>
> 3.This "fail over" works (almost 100% of the times), we see the
CPU
> load from glusterfsd on Server2. As expected, Server1 does not have 
> any load because is "offline".
>
> 4.After a while we bring up the NIC on Server1 again. In this step we 
> realized that the expected behavior is that when bringing up this NIC, 
> this server should take over again (something like active-passive 
> behavior) but this happens only 5-10% of the times.  The CPU load is 
> still on Server2.I'm not sure I would have that expectation. The second server will have 
taken over the open FD and the reads should come from there. The reads 
for a given fd come from the first-to-respond to the lookup().
>
> 5.After some time, we bring down the NIC on Server2 expecting that 
> Server1 takes over.  This second "fail over" crashes. The VM
complains
> about I/O errors which can only be resolved by restarting the VM and 
> sometimes even removing and creating the volume again.
>
>
> After some test, we realized that if restarting the glusterd daemon 
> (/etc/init.d/glusterd restart) on Server1 after step 3 or before step 
> 4, the Server1 takes over automatically without bringing down Server2 
> or anything like that.Check the logs for glusterd 
(/var/log/glusterfs/etc-glusterfs-glusterd.vol.log) for clues. Perhaps 
the /way/ you're taking down the NIC is exposing some bug. Perhaps 
instead of taking it down, use iptables or just killall
glusterfsd.>
>
> We tested this using the normal FUSE mount and libgfapi. If using 
> FUSE, the local mount sometimes becomes unavailable (ls shows not more 
> files) if the failover fails.
>
>
> We have a few fundamental questions in this regard:
>
> i) Is Gluster supposed to handle such a scenario or are we making 
> wrong assumptions? Because the only solution we found is to restart 
> the daemon when a network outage occurs, but this is not acceptable in 
> a real scenario with VMs running real applications.I host my (raw and qcow2) vm images on a gluster volume. Since my 
servers are not expected to hard-crash a lot, I take them down for 
maintenance (kernel updates and such) gracefully, killing the processes 
first. This closes the TCP connections and everything just keeps humming 
along.
>
> ii) What is the recommended configuration in terms of caching (QEMU: 
> cache=none/writethrough/writeback) and direct I/O (FIO and Gluster) to 
> maximize the reliability of the failover process? We varied the 
> parameters but could find a working configuration. Do these parameters 
> have an impact at all?To the best of my knowledge, none of those should affect reliability.
>
>
>
>
> FIO test specification:
>
> [global]
> direct=1
> ioengine=libaio
> iodepth=4
> filename=/dev/vdb
> runtime=300
> numjobs=1
>
> [maxthroughput]
> rw=read
> bs=16k
>
>
>
> VM configuration:
>
> <domain type='kvm' id='6'>
>   <name>testvm</name>
> <uuid>93877c03-605b-ed67-1ab2-2ba16b5fb6b5</uuid>
>   <memory unit='KiB'>2097152</memory>
>   <currentMemory unit='KiB'>2097152</currentMemory>
>   <vcpu placement='static'>1</vcpu>
>   <os>
>     <type arch='x86_64'
machine='pc-1.1'>hvm</type>
>     <boot dev='hd'/>
>   </os>
>   <features>
>     <acpi/>
>     <apic/>
>     <pae/>
>   </features>
>   <clock offset='utc'/>
>   <on_poweroff>destroy</on_poweroff>
>   <on_reboot>restart</on_reboot>
>   <on_crash>restart</on_crash>
>   <devices>
>     <emulator>/usr/bin/kvm</emulator>
>     <disk type='block' device='disk'>
>       <driver name='qemu' type='raw'
cache='writethrough'/>
>       <source dev='/mnt/local/io-perf.img'/>
>       <target dev='vda' bus='virtio'/>
>       <alias name='virtio-disk0'/>
>       <address type='pci' domain='0x0000'
bus='0x00' slot='0x04'
> function='0x0'/>
>     </disk>
>     <disk type='block' device='disk'>
>       <driver name='qemu' type='raw'
cache='writethrough'/>
>       <source dev='/mnt/shared/io-perf-testdisk.img'/>
>       <target dev='vdb' bus='virtio'/>
>       <alias name='virtio-disk1'/>
>       <address type='pci' domain='0x0000'
bus='0x00' slot='0x07'
> function='0x0'/>
>     </disk>
>     <controller type='usb' index='0'>
>       <alias name='usb0'/>
>       <address type='pci' domain='0x0000'
bus='0x00' slot='0x01'
> function='0x2'/>
>     </controller>
>     <interface type='network'>
>       <mac address='52:54:00:36:5f:dd'/>
>       <source network='default'/>
>       <target dev='vnet0'/>
>       <model type='virtio'/>
>       <alias name='net0'/>
>       <address type='pci' domain='0x0000'
bus='0x00' slot='0x03'
> function='0x0'/>
>     </interface>
>     <input type='mouse' bus='ps2'/>
>     <graphics type='vnc' port='5900'
autoport='yes' listen='127.0.0.1'>
>       <listen type='address' address='127.0.0.1'/>
>     </graphics>
>     <video>
>       <model type='cirrus' vram='9216'
heads='1'/>
>       <alias name='video0'/>
>       <address type='pci' domain='0x0000'
bus='0x00' slot='0x02'
> function='0x0'/>
>     </video>
>     <memballoon model='virtio'>
>       <alias name='balloon0'/>
>       <address type='pci' domain='0x0000'
bus='0x00' slot='0x05'
> function='0x0'/>
>     </memballoon>
>   </devices>
>   <seclabel type='none'/>
> </domain>
>
>
>
>
> Thank you very much in advance,
> Jose Lausuch
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131022/1b8b7c4f/attachment.html>

Gluster users - Oct 2013 - GlusterFS HA testing feedback

[Gluster-users] GlusterFS HA testing feedback

[Gluster-users] GlusterFS HA testing feedback

[Gluster-users] GlusterFS HA testing feedback