thr3ads.net - Gluster users - [Gluster-users] performance [Aug 2020]

If this information is useful, please help other people find it:
Share via:

Strahil Nikolov

2020-Aug-04 04:00 UTC

[Gluster-users] performance

?? 4 ?????? 2020 ?. 6:01:17 GMT+03:00, Computerisms Corporation <bob at
computerisms.ca> ??????:>Hi Gurus,
>
>I have been trying to wrap my head around performance improvements on
>my 
>gluster setup, and I don't seem to be making any progress.  I mean 
>forward progress.  making it worse takes practically no effort at all.
>
>My gluster is distributed-replicated across 6 bricks and 2 servers,
>with 
>an arbiter on each server.  I designed it like this so I have an 
>expansion path to more servers in the future (like the staggered
>arbiter 
>diagram in the red hat documentation).  gluster v info output is below.
>
>I have compiled gluster 7.6 from sources on both servers.
There  is a 7.7 version which is fixing somw stuff. Why do you have to compile
it from source ?
>Servers are 6core/3.4Ghz with 32 GB RAM, no swap, and SSD and gigabit 
>network connections.  They are running debian, and are being used as 
>redundant web servers.  There is some 3Million files on the Gluster 
>Storage averaging 130KB/file.  
This type of workload is called 'metadata-intensive'.
There are some recommendations for this type of workload:
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/small_file_performance_enhancements

Keep an eye on the section that mentions dirty-ratio?= 5
&dirty-background-ration?= 2.

>Currently only one of the two servers is
>
>serving web services.  There are well over 100 sites, and apache 
>server-status claims around 5 hits per second, depending on time of
>day, 
>so a fair bit of logging going on.  The gluster is only holding website
>
>data and config files that will be common between the two servers, no 
>databases or anything like that on the Gluster.
>
>When the serving server is under load load average is consistently 
>12-20.  glusterfs is always at the top with 150%-250% cpu, and each of
>3 
>bricks at roughly 50-70%, so consistently pegging 4 of the 6 cores. 
>apache processes will easily eat up all the rest of the cpus after
>that. 
>  And web page response time is underwhelming at best.
>
>Interestingly, mostly because it is not something I have ever 
>experienced before, software interrupts sit between 1 and 5 on each 
>core, but the last core is usually sitting around 20.  Have never 
>encountered a high load average where the si number was ever 
>significant.  I have googled the crap out of that (as well as gluster 
>performance in general), there are nearly limitless posts about what it
>
>is, but have yet to see one thing to explain what to do about it. 
There is an explanation  about that in the link I provided above:

Configuring a higher event threads value than the available processing units
could again cause context switches on these threads. As a result reducing the
number deduced from the previous step to a number that is less that the
available processing units is recommended.

>Sadly 
>I can't really shut down the gluster process to confirm if that is the 
>cause, but it's a pretty good bet, I think.
>
>When the system is not under load, glusterfs will be running at around 
>100% with each of the 3 bricks around 35%, so using 2 cores when doing 
>not much of anything.
>
>nload shows the network cards rarely climb above 300 Mbps unless I am 
>doing a direct file transfer between the servers, in which case it gets
>
>right up to the 1Gbps limit.  RAM is never above 15GB unless I am 
>causing it to happen.  atop show a disk busy percentage, it is often 
>above 50% and sometimes will hit 100%, and is no where near as 
>consistently showing excessive usage like the cpu cores are.  The cpu 
>definitely seems to be the bottleneck.
>When I found out about the groups directory, I figured one of those
>must 
>be useful to me, but as best as I can tell they are not.  But I am 
>really hoping that someone has configured a system like mine and has a 
>good group file they might share for this situation, or a peak at their
>
>volume info output?
>
>or maybe this is really just about as good as I should expect?  Maybe 
>the fix is that I need more/faster cores?  I hope not, as that isn't 
>really an option.
>
>Anyway, here is my volume info as promised.
>
>root at mooglian:/Computerisms/sites/computerisms.ca/log# gluster v info
>
>Volume Name: webisms
>Type: Distributed-Replicate
>Volume ID: 261901e7-60b4-4760-897d-0163beed356e
>Status: Started
>Snapshot Count: 0
>Number of Bricks: 2 x (2 + 1) = 6
>Transport-type: tcp
>Bricks:
>Brick1: mooglian:/var/GlusterBrick/replset-0/webisms-replset-0
>Brick2: moogle:/var/GlusterBrick/replset-0/webisms-replset-0
>Brick3: moogle:/var/GlusterBrick/replset-0-arb/webisms-replset-0-arb 
>(arbiter)
>Brick4: moogle:/var/GlusterBrick/replset-1/webisms-replset-1
>Brick5: mooglian:/var/GlusterBrick/replset-1/webisms-replset-1
>Brick6: mooglian:/var/GlusterBrick/replset-1-arb/webisms-replset-1-arb 
>(arbiter)
>Options Reconfigured:
>auth.allow: xxxx
>performance.client-io-threads: off
>nfs.disable: on
>storage.fips-mode-rchecksum: on
>transport.address-family: inet
>performance.stat-prefetch: on
>network.inode-lru-limit: 200000
>performance.write-behind-window-size: 4MB
>performance.readdir-ahead: on
>performance.io-thread-count: 64
>performance.cache-size: 8GB
>server.event-threads: 8
>client.event-threads: 8
>performance.nl-cache-timeout: 600
As 'storage.fips-mode-rchecksum' is using sha256, you can try to disable
it - which should use the less cpu intensive md5. Yet, I have never played with
that option ...


Check the RH page about the tunings and try different values  for the event
threads.


Best Regards,
Strahil Nikolov

Computerisms Corporation

2020-Aug-04 19:47 UTC

head link

[Gluster-users] performance

Hi Strahil, thanks for your response.
>>
>> I have compiled gluster 7.6 from sources on both servers.
> 
> There  is a 7.7 version which is fixing somw stuff. Why do you have to
compile it from source ?
Because I have often found with other stuff in the past compiling from 
source makes a bunch of problems go away.  software generally works the 
way the developers expect it to if you use the sources, so they are 
better able to help if required.  so now I generally compile most of my 
center-piece softwares and use packages for all the supporting stuff.
> 
>> Servers are 6core/3.4Ghz with 32 GB RAM, no swap, and SSD and gigabit
>> network connections.  They are running debian, and are being used as
>> redundant web servers.  There is some 3Million files on the Gluster
>> Storage averaging 130KB/file.
> 
> This type of workload is called 'metadata-intensive'.
does this mean the metadata-cache group file would be a good one to 
enable?  will try.

waited 10 minutes, no change that I can see.
> There are some recommendations for this type of workload:
>
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/small_file_performance_enhancements
> 
> Keep an eye on the section that mentions dirty-ratio?= 5
&dirty-background-ration?= 2.
I have actually read that whole manual, and specifically that page 
several times.  And also this one:

https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.1/html/administration_guide/small_file_performance_enhancements

Perhaps I am not understanding it correctly.  I tried these suggestions 
before and it got worse, not better.  so I have been operating under the 
assumption that maybe these guidelines are not appropriate for newer 
versions.

But will try again.  adjusting the dirty ratios.

Load average went from around 15 to 35 in about 2-3 minutes, but 20 
minutes later, it is back down to 20.  It may be having a minimal 
positive impact on cpu, though, I haven't see the main glusterfs go over 
200% since I changed this, an the brick processes are hovering just 
below 50%  where they were consistently above 50% before.  Might just be 
time of day with the system not as busy.

after watching for 30 minutes, load average is fluctuating between 10 
and 30, but cpu idle appears marginally better on average than it was.
>> Interestingly, mostly because it is not something I have ever
>> experienced before, software interrupts sit between 1 and 5 on each
>> core, but the last core is usually sitting around 20.  Have never
>> encountered a high load average where the si number was ever
>> significant.  I have googled the crap out of that (as well as gluster
>> performance in general), there are nearly limitless posts about what it
>>
>> is, but have yet to see one thing to explain what to do about it.
> 
> There is an explanation  about that in the link I provided above:
> 
> Configuring a higher event threads value than the available processing
units could again cause context switches on these threads. As a result reducing
the number deduced from the previous step to a number that is less that the
available processing units is recommended.
Okay, again, have played with these numbers before and it did not pan 
out as expected.  if I understand it correctly, I have 3 brick processes 
(glusterfsd), so the "deduced" number should be 3, and I should set it
lower than that, so 2.  but it also says "If a specific thread consumes 
more number of CPU cycles than needed, increasing the event thread count 
would enhance the performance of the Red Hat Storage Server."  which is 
why I had it at 8.

but will set it to 2 now.  load average is at 17 to start, waiting a 
while to see what happens.

so 15 minutes later, load average is currently 12, but is fluctuating 
between 10 and 20, have seen no significant change in cpu usage or 
anything else in top.

now try also changing server.outstanding-rpc-limit to 256 and wait.

15 minutes later; load has been above 30 but is currently back down to 
12.  no significant change in cpu.  try increasing to 512 and wait.

15 minutes later, load average is 50.  no signficant difference in cpu. 
Software interrupts remain around where they were.  wa from top remains 
about where it was.  not sure why load average is climbing so high. 
changing rpc-limit to 128.

ugh.  10 minutes later, load average just popped over 100.  resetting 
rpc-limit.

now trying cluster.lookup-optimize on, lazy rebalancing (probably a bad 
idea on the live system, but how much worse can it get?)  Ya, bad idea, 
80 hours estimated to complete, load is over 50 and server is crawling. 
disabling rebalance and turning lookup-optimize off, for now.

right now the only suggested parameter I haven't played with is the 
performance.io-thread-count, which I currently have at 64.

sigh.  an hour later load average is 80 and climbing.  apache processes 
are numbering in the hundreds and I am constantly having to restart it. 
this brings load average down to 5, but as apache processes climb and 
are held open load average gets up to over 100 again with 3-4 minutes, 
and system starts going non-responsive.  rinse and repeat.

so followed all the recommendations, maybe the dirty settings had a 
small positive impact, but overall system is most definitely worse for 
having made the changes.

I have returned the configs back to how they were except the dirty 
settings and the metadata-cache group.  increased performance.cache-size 
to 16GB for now, because that is the one thing that seems to help when I 
"tune" (aka make worse) the system.  have had to restart apache a
couple
dozen times or more, but after another 30 minutes or so system has 
pretty much settled back to how it was before I started.  cpu is like I 
originally stated, all 6 cores maxed out most of the time, software 
interrupts still have all cpus running around 5 with the last one 
consistently sitting around 20-25.  Disk is busy but not usually maxed 
out.  RAM is about half used.  network load peaks at about 1/3 capacity. 
  load average is between 10 and 20.  sites are responding, but sluggish.

so am I not reading these recommendations and following the instructions 
correctly?  am I not waiting long enough after each implementation, 
should I be making 1 change per day instead of thinking 15 minutes 
should be enough for the system to catch up?  I have read the full red 
hat documentation and the significant majority of the gluster docs, 
maybe I am missing something else there?  should these settings have had 
a different effect than they did?

For what it's worth, I am running ext4 as my underlying fs and I have 
read a few times that XFS might have been a better choice.  But that is 
not a trivial experiment to make at this time with the system in 
production.  It's one thing (and still a bad thing to be sure) to 
semi-bork the system for an hour or two while I play with 
configurations, but would take a day or so offline to reformat and 
restore the data.
> 
> As 'storage.fips-mode-rchecksum' is using sha256, you can try to
disable it - which should use the less cpu intensive md5. Yet, I have never
played with that option ...
Done.  no signficant difference than I can see.
> Check the RH page about the tunings and try different values  for the event
threads.
in the past I have tried 2, 4, 8, 16, and 32.  Playing with just those I 
never noticed that any of them made any difference.  Though I might have 
some different options now than I did then, so might try these again 
throughout the day...

Thanks again for your time Strahil, if you have any more thoughts would 
love to hear them.
> 
> 
> Best Regards,
> Strahil Nikolov
>

Gluster users - Aug 2020 - performance

[Gluster-users] performance

[Gluster-users] performance