thr3ads.net - Gluster users - [Gluster-users] Very odd performance issue [May 2017]

If this information is useful, please help other people find it:
Share via:

David Miller

2017-May-04 18:48 UTC

[Gluster-users] Very odd performance issue

Background:  4 identical gluster servers with 15 TB each in 2x2 setup.
CentOS Linux release 7.3.1611 (Core)
glusterfs-server-3.9.1-1.el7.x86_64
client systems are using:
glusterfs-client	 3.5.2-2+deb8u3

The cluster has ~12 TB in use with 21 million files.  Lots of jpgs.  About 12
clients are mounting gluster volumes.

Network load is light: iftop shows each server has 10-15 Mbit reads and about
half that in writes.

What I?m seeing that concerns me is that one box, gluster4, has roughly twice
the CPU utilization and twice or more the load average of the other three
servers.  gluster4 has a 24 hour average of about 30% CPU utilization, something
that seems to me to be way out of line for a couple MB/sec of traffic.

In running volume top, the odd thing I see is that for gluster1-3 I get latency
summaries like this:
Brick: gluster1.publicinteractive.com:/gluster/drupal_prod
?????????????????????????????
%-latency  Avg-latency  Min-Latency  Max-Latency   No. of calls       Fop
 --------  -----------  -----------  -----------   ------------      ----

 9.96     675.07 us      15.00 us 1067793.00 us         205060     INODELK 
15.85    3414.20 us      16.00 us  773621.00 us          64494        READ
51.35    2235.96 us      12.00 us 1093609.00 us         319120      LOOKUP

? but my problem server has far more inodelk latency:

12.01    4712.03 us      17.00 us 1773590.00 us          47214        READ
27.50    2390.27 us      14.00 us 1877571.00 us         213121     INODELK
28.70    1643.65 us      12.00 us 1837696.00 us         323407      LOOKUP

The servers are intended to be identical, and are indeed identical hardware.

Suggestions on where to look or which FM to RT ver welcome indeed.

Thanks,

David




-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170504/64821ce5/attachment.html>

Ben Turner

2017-May-05 01:10 UTC

head link

[Gluster-users] Very odd performance issue

----- Original Message -----> From: "David Miller" <dmiller at metheus.org>
> To: gluster-users at gluster.org
> Sent: Thursday, May 4, 2017 2:48:38 PM
> Subject: [Gluster-users] Very odd performance issue
> 
> Background: 4 identical gluster servers with 15 TB each in 2x2 setup.
> CentOS Linux release 7.3.1611 (Core)
> glust erfs-server-3.9.1-1.el7.x86_64
> client systems are using:
> glusterfs-client 3.5.2-2+deb8u3
> 
> The cluster has ~12 TB in use with 21 million files. Lots of jpgs. About 12
> clients are mounting gluster volumes.
> 
> Network load is light: iftop shows each server has 10-15 Mbit reads and
about
> half that in writes.
> 
> What I?m seeing that concerns me is that one box, gluster4, has roughly
twice
> the CPU utilization and twice or more the load average of the other three
> servers. gluster4 has a 24 hour average of about 30% CPU utilization,
> something that seems to me to be way out of line for a couple MB/sec of
> traffic.
> 
> In running volume top, the odd thing I see is that for gluster1-3 I get
> latency summaries like this:
> Brick: gluster1.publicinteractive.com :/gluster/drupal_prod
> ?????????????????????????????
> %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
> -------- ----------- ----------- ----------- ------------ ----
> 
> 9.96 675.07 us 15.00 us 1067793.00 us 205060 INODELK
> 15.85 3414.20 us 16.00 us 773621.00 us 64494 READ
> 51.35 2235.96 us 12.00 us 1093609.00 us 319120 LOOKUP
> 
> ? but my problem server has far more inodelk latency:
> 
> 12.01 4712.03 us 17.00 us 1773590.00 us 47214 READ
> 27.50 2390.27 us 14.00 us 1877571.00 us 213121 INODELK
> 28.70 1643.65 us 12.00 us 1837696.00 us 323407 LOOKUP
> 
> The servers are intended to be identical, and are indeed identical
hardware.
> 
> Suggestions on where to look or which FM to RT ver welcome indeed.
IIRC INODELK is for internal locking / synchronization:

"GlusterFS has locks translator which provides the following internal
locking operations called  inodelk, entrylk which are used by afr to achieve
synchronization of operations on files or directories that conflict with each
other."

I found a bug where there was a leak:

https://bugzilla.redhat.com/show_bug.cgi?id=1405886

It was fixed in the 3.8 line, it may be worth looking into upgrading the gluster
version on your clients to eliminate any issues that were fixed between 3.5(your
client version) and 3.9(your server version).

Also, have a look at the brick and client logs.  You could try searching them
for "INODELK".  Are your clients accessing alot of the same files at
the same time?  Also on the server where you are seeing the higher load check
the self heal daemon logs to see if there is any healing happening.

Sorry I don't have anything concrete, like I said it may be worth upgrading
the clients and having a look at your logs to see if you can glean any
information from them.

-b
> 
> Thanks,
> 
> David
> 
> 
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

Gluster users - May 2017 - Very odd performance issue

[Gluster-users] Very odd performance issue

[Gluster-users] Very odd performance issue