thr3ads.net - Gluster users - [Gluster-users] performance [Aug 2020]

If this information is useful, please help other people find it:
Share via:

Computerisms Corporation

2020-Aug-04 03:01 UTC

[Gluster-users] performance

Hi Gurus,

I have been trying to wrap my head around performance improvements on my 
gluster setup, and I don't seem to be making any progress.  I mean 
forward progress.  making it worse takes practically no effort at all.

My gluster is distributed-replicated across 6 bricks and 2 servers, with 
an arbiter on each server.  I designed it like this so I have an 
expansion path to more servers in the future (like the staggered arbiter 
diagram in the red hat documentation).  gluster v info output is below. 
I have compiled gluster 7.6 from sources on both servers.

Servers are 6core/3.4Ghz with 32 GB RAM, no swap, and SSD and gigabit 
network connections.  They are running debian, and are being used as 
redundant web servers.  There is some 3Million files on the Gluster 
Storage averaging 130KB/file.  Currently only one of the two servers is 
serving web services.  There are well over 100 sites, and apache 
server-status claims around 5 hits per second, depending on time of day, 
so a fair bit of logging going on.  The gluster is only holding website 
data and config files that will be common between the two servers, no 
databases or anything like that on the Gluster.

When the serving server is under load load average is consistently 
12-20.  glusterfs is always at the top with 150%-250% cpu, and each of 3 
bricks at roughly 50-70%, so consistently pegging 4 of the 6 cores. 
apache processes will easily eat up all the rest of the cpus after that. 
  And web page response time is underwhelming at best.

Interestingly, mostly because it is not something I have ever 
experienced before, software interrupts sit between 1 and 5 on each 
core, but the last core is usually sitting around 20.  Have never 
encountered a high load average where the si number was ever 
significant.  I have googled the crap out of that (as well as gluster 
performance in general), there are nearly limitless posts about what it 
is, but have yet to see one thing to explain what to do about it.  Sadly 
I can't really shut down the gluster process to confirm if that is the 
cause, but it's a pretty good bet, I think.

When the system is not under load, glusterfs will be running at around 
100% with each of the 3 bricks around 35%, so using 2 cores when doing 
not much of anything.

nload shows the network cards rarely climb above 300 Mbps unless I am 
doing a direct file transfer between the servers, in which case it gets 
right up to the 1Gbps limit.  RAM is never above 15GB unless I am 
causing it to happen.  atop show a disk busy percentage, it is often 
above 50% and sometimes will hit 100%, and is no where near as 
consistently showing excessive usage like the cpu cores are.  The cpu 
definitely seems to be the bottleneck.

When I found out about the groups directory, I figured one of those must 
be useful to me, but as best as I can tell they are not.  But I am 
really hoping that someone has configured a system like mine and has a 
good group file they might share for this situation, or a peak at their 
volume info output?

or maybe this is really just about as good as I should expect?  Maybe 
the fix is that I need more/faster cores?  I hope not, as that isn't 
really an option.

Anyway, here is my volume info as promised.

root at mooglian:/Computerisms/sites/computerisms.ca/log# gluster v info

Volume Name: webisms
Type: Distributed-Replicate
Volume ID: 261901e7-60b4-4760-897d-0163beed356e
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: mooglian:/var/GlusterBrick/replset-0/webisms-replset-0
Brick2: moogle:/var/GlusterBrick/replset-0/webisms-replset-0
Brick3: moogle:/var/GlusterBrick/replset-0-arb/webisms-replset-0-arb 
(arbiter)
Brick4: moogle:/var/GlusterBrick/replset-1/webisms-replset-1
Brick5: mooglian:/var/GlusterBrick/replset-1/webisms-replset-1
Brick6: mooglian:/var/GlusterBrick/replset-1-arb/webisms-replset-1-arb 
(arbiter)
Options Reconfigured:
auth.allow: xxxx
performance.client-io-threads: off
nfs.disable: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
performance.stat-prefetch: on
network.inode-lru-limit: 200000
performance.write-behind-window-size: 4MB
performance.readdir-ahead: on
performance.io-thread-count: 64
performance.cache-size: 8GB
server.event-threads: 8
client.event-threads: 8
performance.nl-cache-timeout: 600


-- 
Bob Miller
Cell: 867-334-7117
Office: 867-633-3760
Office: 867-322-0362
www.computerisms.ca

Strahil Nikolov

2020-Aug-04 04:00 UTC

head link

[Gluster-users] performance

?? 4 ?????? 2020 ?. 6:01:17 GMT+03:00, Computerisms Corporation <bob at
computerisms.ca> ??????:>Hi Gurus,
>
>I have been trying to wrap my head around performance improvements on
>my 
>gluster setup, and I don't seem to be making any progress.  I mean 
>forward progress.  making it worse takes practically no effort at all.
>
>My gluster is distributed-replicated across 6 bricks and 2 servers,
>with 
>an arbiter on each server.  I designed it like this so I have an 
>expansion path to more servers in the future (like the staggered
>arbiter 
>diagram in the red hat documentation).  gluster v info output is below.
>
>I have compiled gluster 7.6 from sources on both servers.
There  is a 7.7 version which is fixing somw stuff. Why do you have to compile
it from source ?
>Servers are 6core/3.4Ghz with 32 GB RAM, no swap, and SSD and gigabit 
>network connections.  They are running debian, and are being used as 
>redundant web servers.  There is some 3Million files on the Gluster 
>Storage averaging 130KB/file.  
This type of workload is called 'metadata-intensive'.
There are some recommendations for this type of workload:
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/small_file_performance_enhancements

Keep an eye on the section that mentions dirty-ratio?= 5
&dirty-background-ration?= 2.

>Currently only one of the two servers is
>
>serving web services.  There are well over 100 sites, and apache 
>server-status claims around 5 hits per second, depending on time of
>day, 
>so a fair bit of logging going on.  The gluster is only holding website
>
>data and config files that will be common between the two servers, no 
>databases or anything like that on the Gluster.
>
>When the serving server is under load load average is consistently 
>12-20.  glusterfs is always at the top with 150%-250% cpu, and each of
>3 
>bricks at roughly 50-70%, so consistently pegging 4 of the 6 cores. 
>apache processes will easily eat up all the rest of the cpus after
>that. 
>  And web page response time is underwhelming at best.
>
>Interestingly, mostly because it is not something I have ever 
>experienced before, software interrupts sit between 1 and 5 on each 
>core, but the last core is usually sitting around 20.  Have never 
>encountered a high load average where the si number was ever 
>significant.  I have googled the crap out of that (as well as gluster 
>performance in general), there are nearly limitless posts about what it
>
>is, but have yet to see one thing to explain what to do about it. 
There is an explanation  about that in the link I provided above:

Configuring a higher event threads value than the available processing units
could again cause context switches on these threads. As a result reducing the
number deduced from the previous step to a number that is less that the
available processing units is recommended.

>Sadly 
>I can't really shut down the gluster process to confirm if that is the 
>cause, but it's a pretty good bet, I think.
>
>When the system is not under load, glusterfs will be running at around 
>100% with each of the 3 bricks around 35%, so using 2 cores when doing 
>not much of anything.
>
>nload shows the network cards rarely climb above 300 Mbps unless I am 
>doing a direct file transfer between the servers, in which case it gets
>
>right up to the 1Gbps limit.  RAM is never above 15GB unless I am 
>causing it to happen.  atop show a disk busy percentage, it is often 
>above 50% and sometimes will hit 100%, and is no where near as 
>consistently showing excessive usage like the cpu cores are.  The cpu 
>definitely seems to be the bottleneck.
>When I found out about the groups directory, I figured one of those
>must 
>be useful to me, but as best as I can tell they are not.  But I am 
>really hoping that someone has configured a system like mine and has a 
>good group file they might share for this situation, or a peak at their
>
>volume info output?
>
>or maybe this is really just about as good as I should expect?  Maybe 
>the fix is that I need more/faster cores?  I hope not, as that isn't 
>really an option.
>
>Anyway, here is my volume info as promised.
>
>root at mooglian:/Computerisms/sites/computerisms.ca/log# gluster v info
>
>Volume Name: webisms
>Type: Distributed-Replicate
>Volume ID: 261901e7-60b4-4760-897d-0163beed356e
>Status: Started
>Snapshot Count: 0
>Number of Bricks: 2 x (2 + 1) = 6
>Transport-type: tcp
>Bricks:
>Brick1: mooglian:/var/GlusterBrick/replset-0/webisms-replset-0
>Brick2: moogle:/var/GlusterBrick/replset-0/webisms-replset-0
>Brick3: moogle:/var/GlusterBrick/replset-0-arb/webisms-replset-0-arb 
>(arbiter)
>Brick4: moogle:/var/GlusterBrick/replset-1/webisms-replset-1
>Brick5: mooglian:/var/GlusterBrick/replset-1/webisms-replset-1
>Brick6: mooglian:/var/GlusterBrick/replset-1-arb/webisms-replset-1-arb 
>(arbiter)
>Options Reconfigured:
>auth.allow: xxxx
>performance.client-io-threads: off
>nfs.disable: on
>storage.fips-mode-rchecksum: on
>transport.address-family: inet
>performance.stat-prefetch: on
>network.inode-lru-limit: 200000
>performance.write-behind-window-size: 4MB
>performance.readdir-ahead: on
>performance.io-thread-count: 64
>performance.cache-size: 8GB
>server.event-threads: 8
>client.event-threads: 8
>performance.nl-cache-timeout: 600
As 'storage.fips-mode-rchecksum' is using sha256, you can try to disable
it - which should use the less cpu intensive md5. Yet, I have never played with
that option ...


Check the RH page about the tunings and try different values  for the event
threads.


Best Regards,
Strahil Nikolov

Artem Russakovskii

2020-Aug-04 04:42 UTC

head link

[Gluster-users] performance

I tried putting all web files (specifically WordPress php and static files
as well as various cache files) on gluster before, and the results were
miserable on a busy site - our usual ~8-10 load quickly turned into 100+
and killed everything.

I had to go back to running just the user uploads (which are static files
in the Wordpress uploads/ dir) on gluster and using rsync (via lsyncd) for
the frequently executed php / cache.

I'd love to figure this out as well and tune gluster for heavy reads and
moderate writes, but I haven't cracked that recipe yet.

On Mon, Aug 3, 2020, 8:08 PM Computerisms Corporation <bob at
computerisms.ca>
wrote:
> Hi Gurus,
>
> I have been trying to wrap my head around performance improvements on my
> gluster setup, and I don't seem to be making any progress.  I mean
> forward progress.  making it worse takes practically no effort at all.
>
> My gluster is distributed-replicated across 6 bricks and 2 servers, with
> an arbiter on each server.  I designed it like this so I have an
> expansion path to more servers in the future (like the staggered arbiter
> diagram in the red hat documentation).  gluster v info output is below.
> I have compiled gluster 7.6 from sources on both servers.
>
> Servers are 6core/3.4Ghz with 32 GB RAM, no swap, and SSD and gigabit
> network connections.  They are running debian, and are being used as
> redundant web servers.  There is some 3Million files on the Gluster
> Storage averaging 130KB/file.  Currently only one of the two servers is
> serving web services.  There are well over 100 sites, and apache
> server-status claims around 5 hits per second, depending on time of day,
> so a fair bit of logging going on.  The gluster is only holding website
> data and config files that will be common between the two servers, no
> databases or anything like that on the Gluster.
>
> When the serving server is under load load average is consistently
> 12-20.  glusterfs is always at the top with 150%-250% cpu, and each of 3
> bricks at roughly 50-70%, so consistently pegging 4 of the 6 cores.
> apache processes will easily eat up all the rest of the cpus after that.
>   And web page response time is underwhelming at best.
>
> Interestingly, mostly because it is not something I have ever
> experienced before, software interrupts sit between 1 and 5 on each
> core, but the last core is usually sitting around 20.  Have never
> encountered a high load average where the si number was ever
> significant.  I have googled the crap out of that (as well as gluster
> performance in general), there are nearly limitless posts about what it
> is, but have yet to see one thing to explain what to do about it.  Sadly
> I can't really shut down the gluster process to confirm if that is the
> cause, but it's a pretty good bet, I think.
>
> When the system is not under load, glusterfs will be running at around
> 100% with each of the 3 bricks around 35%, so using 2 cores when doing
> not much of anything.
>
> nload shows the network cards rarely climb above 300 Mbps unless I am
> doing a direct file transfer between the servers, in which case it gets
> right up to the 1Gbps limit.  RAM is never above 15GB unless I am
> causing it to happen.  atop show a disk busy percentage, it is often
> above 50% and sometimes will hit 100%, and is no where near as
> consistently showing excessive usage like the cpu cores are.  The cpu
> definitely seems to be the bottleneck.
>
> When I found out about the groups directory, I figured one of those must
> be useful to me, but as best as I can tell they are not.  But I am
> really hoping that someone has configured a system like mine and has a
> good group file they might share for this situation, or a peak at their
> volume info output?
>
> or maybe this is really just about as good as I should expect?  Maybe
> the fix is that I need more/faster cores?  I hope not, as that isn't
> really an option.
>
> Anyway, here is my volume info as promised.
>
> root at mooglian:/Computerisms/sites/computerisms.ca/log# gluster v info
>
> Volume Name: webisms
> Type: Distributed-Replicate
> Volume ID: 261901e7-60b4-4760-897d-0163beed356e
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 2 x (2 + 1) = 6
> Transport-type: tcp
> Bricks:
> Brick1: mooglian:/var/GlusterBrick/replset-0/webisms-replset-0
> Brick2: moogle:/var/GlusterBrick/replset-0/webisms-replset-0
> Brick3: moogle:/var/GlusterBrick/replset-0-arb/webisms-replset-0-arb
> (arbiter)
> Brick4: moogle:/var/GlusterBrick/replset-1/webisms-replset-1
> Brick5: mooglian:/var/GlusterBrick/replset-1/webisms-replset-1
> Brick6: mooglian:/var/GlusterBrick/replset-1-arb/webisms-replset-1-arb
> (arbiter)
> Options Reconfigured:
> auth.allow: xxxx
> performance.client-io-threads: off
> nfs.disable: on
> storage.fips-mode-rchecksum: on
> transport.address-family: inet
> performance.stat-prefetch: on
> network.inode-lru-limit: 200000
> performance.write-behind-window-size: 4MB
> performance.readdir-ahead: on
> performance.io-thread-count: 64
> performance.cache-size: 8GB
> server.event-threads: 8
> client.event-threads: 8
> performance.nl-cache-timeout: 600
>
>
> --
> Bob Miller
> Cell: 867-334-7117
> Office: 867-633-3760
> Office: 867-322-0362
> www.computerisms.ca
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20200803/81d3aa66/attachment.html>

Gluster users - Aug 2020 - performance

[Gluster-users] performance

[Gluster-users] performance

[Gluster-users] performance