thr3ads.net - Gluster users - [Gluster-users] Finding performance bottlenecks [May 2018]

If this information is useful, please help other people find it:
Share via:

Tony Hoyle

2018-May-01 09:38 UTC

[Gluster-users] Finding performance bottlenecks

On 01/05/2018 02:27, Thing wrote:> Hi,
> 
> So is the KVM or Vmware as the host(s)?? I basically have the same setup
> ie 3 x 1TB "raid1" nodes and VMs, but 1gb networking.? I do
notice with
> vmware using NFS disk was pretty slow (40% of a single disk) but this
> was over 1gb networking which was clearly saturating.? Hence I am moving
> to KVM to use glusterfs hoping for better performance and bonding, it
> will be interesting to see which host type runs faster.
1gb will always be the bottleneck in that situation - that's going too
max out at the speed of a single disk or lower.  You need at minimum to
bond interfaces and preferably go to 10gb to do that.

Our NFS actually ends up faster than local disk because the read speed
of the raid is faster than the read speed of the local disk.
> Which operating system is gluster on???
Debian Linux.  Supermicro motherboards, 24 core i7 with 128GB of RAM on
the VM hosts.
> Did you do iperf between all nodes?
Yes, around 9.7Gb/s

It doesn't appear to be raw read speed but iowait.  Under nfs load with
multiple VMs I get an iowait of around 0.3%.  Under gluster, never less
than 10% and glusterfsd is often the top of the CPU usage.  This causes
a load average of ~12 compared to 3 over NFS, and absolutely kills VMs
esp. Windows ones - one machine I set booting and it was still booting
30 minutes later!

Tony

Vincent Royer

2018-May-03 18:58 UTC

head link

[Gluster-users] Finding performance bottlenecks

It worries me how many threads talk about low performance.  I'm about to
build out a replica 3 setup and run Ovirt with a bunch of Windows VMs.

Are the issues Tony is experiencing "normal" for Gluster?  Does anyone
here
have a system with windows VMs and have good performance?

*Vincent Royer*
*778-825-1057*


<http://www.epicenergy.ca/>
*SUSTAINABLE MOBILE ENERGY SOLUTIONS*





On Wed, May 2, 2018 at 7:52 AM Tony Hoyle <tony at hoyle.me.uk> wrote:
> On 01/05/2018 02:27, Thing wrote:
> > Hi,
> >
> > So is the KVM or Vmware as the host(s)?  I basically have the same
setup
> > ie 3 x 1TB "raid1" nodes and VMs, but 1gb networking.  I do
notice with
> > vmware using NFS disk was pretty slow (40% of a single disk) but this
> > was over 1gb networking which was clearly saturating.  Hence I am
moving
> > to KVM to use glusterfs hoping for better performance and bonding, it
> > will be interesting to see which host type runs faster.
>
> 1gb will always be the bottleneck in that situation - that's going too
> max out at the speed of a single disk or lower.  You need at minimum to
> bond interfaces and preferably go to 10gb to do that.
>
> Our NFS actually ends up faster than local disk because the read speed
> of the raid is faster than the read speed of the local disk.
>
> > Which operating system is gluster on?
>
> Debian Linux.  Supermicro motherboards, 24 core i7 with 128GB of RAM on
> the VM hosts.
>
> > Did you do iperf between all nodes?
>
> Yes, around 9.7Gb/s
>
> It doesn't appear to be raw read speed but iowait.  Under nfs load with
> multiple VMs I get an iowait of around 0.3%.  Under gluster, never less
> than 10% and glusterfsd is often the top of the CPU usage.  This causes
> a load average of ~12 compared to 3 over NFS, and absolutely kills VMs
> esp. Windows ones - one machine I set booting and it was still booting
> 30 minutes later!
>
> Tony
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180503/afd96f41/attachment.html>

Darrell Budic

2018-May-03 21:24 UTC

head link

[Gluster-users] Finding performance bottlenecks

Tony?s performance sounds significantly sub par from my experience. I did some
testing with gluster 3.12 and Ovirt 3.9, on my running production cluster when I
enabled the glfsapi, even my pre numbers are significantly better than what Tony
is reporting:

???????????????????	
Before using gfapi:

]# dd if=/dev/urandom of=test.file bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 90.1843 s, 11.9 MB/s
# echo 3 > /proc/sys/vm/drop_caches
# dd if=test.file of=/dev/null 
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 3.94715 s, 272 MB/s

# hdparm -tT /dev/vda

/dev/vda:
 Timing cached reads:   17322 MB in  2.00 seconds = 8673.49 MB/sec
 Timing buffered disk reads: 996 MB in  3.00 seconds = 331.97 MB/sec

#bonnie++ -d . -s 8G -n 0 -m pre-glapi -f -b -u root

Version  1.97       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
pre-glapi        8G           196245  30 105331  15           962775  49  1638 
34
Latency                        1578ms    1383ms               201ms     301ms

Version  1.97       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
pre-glapi        8G           155937  27 102899  14           1030285  54  1763 
45
Latency                         694ms    1333ms               114ms     229ms

(note, sequential reads seem to have been influenced by caching somewhere?)

After switching to gfapi:

# dd if=/dev/urandom of=test.file bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 80.8317 s, 13.3 MB/s
# echo 3 > /proc/sys/vm/drop_caches
# dd if=test.file of=/dev/null 
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 3.3473 s, 321 MB/s

# hdparm -tT /dev/vda

/dev/vda:
 Timing cached reads:   17112 MB in  2.00 seconds = 8568.86 MB/sec
 Timing buffered disk reads: 1406 MB in  3.01 seconds = 467.70 MB/sec

#bonnie++ -d . -s 8G -n 0 -m     glapi -f -b -u root

Version  1.97       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
    glapi        8G           359100  59 185289  24           489575  31  2079 
67
Latency                         160ms     355ms             36041us     185ms

Version  1.97       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
    glapi        8G           341307  57 180546  24           472572  35  2655 
61
Latency                         153ms     394ms               101ms     116ms

So excellent improvement in write throughput, but the significant improvement in
latency is what was most noticed by users. Anecdotal reports of 2x+ performance
improvements, with one remarking that it?s like having dedicated disks :)

This system is on my production cluster, so it?s not getting exclusive disk
access, but this VM is not doing anything else itself. The cluster is 3 xeon
E5-2609 v3 @ 1.90GHz servers w/ 64G ram, SATA2 disks; 2 with 9x spindles each, 1
with 8x slightly faster disks (all spinners). Using ZFS stripes with lz4
compression and 10G connectivity to 8 hosts. Running gluster 3.12.3 at the
moment. The cluster itself has about 70 running VMs in varying states of
switching to gfapi use, but my main sql servers are using their own volumes and
not competing for this one. These have not yet had the spectre/meltdown patches
applied.
 
This will be skewed because I forced it to not steal all the ram on the server
(reads will certainly be cached), but an idea of what it can do disk wise, on
the volume used above:
# bonnie++ -d . -s 8G -n 0 -m zfs-server -f -b -u root -r 4096
Version  1.97       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
zfs-server       8G           604940  79 510410  87           1393862  99  3164 
91
Latency                       99545us     100ms               247us     152ms

Just for fun from one of the servers showing base load and this testing:


??????????????????????????





> From: Vincent Royer <vincent at epicenergy.ca>
> Subject: Re: [Gluster-users] Finding performance bottlenecks
> Date: May 3, 2018 at 1:58:03 PM CDT
> To: tony at hoyle.me.uk
> Cc: gluster-users at gluster.org
> 
> It worries me how many threads talk about low performance.  I'm about
to build out a replica 3 setup and run Ovirt with a bunch of Windows VMs.
> 
> Are the issues Tony is experiencing "normal" for Gluster?  Does
anyone here have a system with windows VMs and have good performance?
> 
> Vincent Royer
> 778-825-1057
> 
> 
>  <http://www.epicenergy.ca/>
> SUSTAINABLE MOBILE ENERGY SOLUTIONS
> 
> 
> 
> 
> 
> On Wed, May 2, 2018 at 7:52 AM Tony Hoyle <tony at hoyle.me.uk
<mailto:tony at hoyle.me.uk>> wrote:
> On 01/05/2018 02:27, Thing wrote:
> > Hi,
> > 
> > So is the KVM or Vmware as the host(s)?  I basically have the same
setup
> > ie 3 x 1TB "raid1" nodes and VMs, but 1gb networking.  I do
notice with
> > vmware using NFS disk was pretty slow (40% of a single disk) but this
> > was over 1gb networking which was clearly saturating.  Hence I am
moving
> > to KVM to use glusterfs hoping for better performance and bonding, it
> > will be interesting to see which host type runs faster.
> 
> 1gb will always be the bottleneck in that situation - that's going too
> max out at the speed of a single disk or lower.  You need at minimum to
> bond interfaces and preferably go to 10gb to do that.
> 
> Our NFS actually ends up faster than local disk because the read speed
> of the raid is faster than the read speed of the local disk.
> 
> > Which operating system is gluster on?  
> 
> Debian Linux.  Supermicro motherboards, 24 core i7 with 128GB of RAM on
> the VM hosts.
> 
> > Did you do iperf between all nodes?
> 
> Yes, around 9.7Gb/s
> 
> It doesn't appear to be raw read speed but iowait.  Under nfs load with
> multiple VMs I get an iowait of around 0.3%.  Under gluster, never less
> than 10% and glusterfsd is often the top of the CPU usage.  This causes
> a load average of ~12 compared to 3 over NFS, and absolutely kills VMs
> esp. Windows ones - one machine I set booting and it was still booting
> 30 minutes later!
> 
> Tony
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users>_______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180503/43e47f67/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-1.png
Type: image/png
Size: 95175 bytes
Desc: not available
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180503/43e47f67/attachment.png>

Ben Turner

2018-May-07 15:03 UTC

head link

[Gluster-users] Finding performance bottlenecks

----- Original Message -----> From: "Tony Hoyle" <tony at hoyle.me.uk>
> To: "Gluster Users" <gluster-users at gluster.org>
> Sent: Tuesday, May 1, 2018 5:38:38 AM
> Subject: Re: [Gluster-users] Finding performance bottlenecks
> 
> On 01/05/2018 02:27, Thing wrote:
> > Hi,
> > 
> > So is the KVM or Vmware as the host(s)?? I basically have the same
setup
> > ie 3 x 1TB "raid1" nodes and VMs, but 1gb networking.? I do
notice with
> > vmware using NFS disk was pretty slow (40% of a single disk) but this
> > was over 1gb networking which was clearly saturating.? Hence I am
moving
> > to KVM to use glusterfs hoping for better performance and bonding, it
> > will be interesting to see which host type runs faster.
> 
> 1gb will always be the bottleneck in that situation - that's going too
> max out at the speed of a single disk or lower.  You need at minimum to
> bond interfaces and preferably go to 10gb to do that.
> 
> Our NFS actually ends up faster than local disk because the read speed
> of the raid is faster than the read speed of the local disk.
> 
> > Which operating system is gluster on?
> 
> Debian Linux.  Supermicro motherboards, 24 core i7 with 128GB of RAM on
> the VM hosts.
> 
> > Did you do iperf between all nodes?
> 
> Yes, around 9.7Gb/s
> 
> It doesn't appear to be raw read speed but iowait.  Under nfs load with
> multiple VMs I get an iowait of around 0.3%.  Under gluster, never less
> than 10% and glusterfsd is often the top of the CPU usage.  This causes
> a load average of ~12 compared to 3 over NFS, and absolutely kills VMs
> esp. Windows ones - one machine I set booting and it was still booting
> 30 minutes later!
Are you properly aligned?  This sounds like the xattr reads / writes used by
gluster may be eating you IOPs, this is exacerbated when storage is misaligned. 
I suggest getting on the latest version of oVirt(I have seen this help) and
evaluate your storage stack.

https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.1/html/administration_guide/formatting_and_mounting_bricks

pvcreate --dataalign = full stripe(RAID stripe * # of data disks)
vgcreate --extensize = full stripe
lvcreate like normal
mkfs.xfs -f -i size=512 -n size=8192 -d su=<stripe size>,sw=<number of
data disks> DEVICE

And mount with:

/dev/rhs_vg/rhs_lv/mountpoint  xfs rw,inode64,noatime,nouuid  1 2

I normally used tuned profile rhgs-random-io and the gluster v set group
virtualization.

HTH

-b
> 
> Tony
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

Seemingly Similar Threads

Search for more seemingly similar threads

Gluster users - May 2018 - Finding performance bottlenecks

[Gluster-users] Finding performance bottlenecks

[Gluster-users] Finding performance bottlenecks

[Gluster-users] Finding performance bottlenecks

[Gluster-users] Finding performance bottlenecks

Seemingly Similar Threads