thr3ads.net - Gluster users - [Gluster-users] Extremely slow cluster performance [Apr 2019]

If this information is useful, please help other people find it:
Share via:

Strahil

2019-Apr-21 15:39 UTC

[Gluster-users] Extremely slow cluster performance

This looks more like FUSE problem.
Are the clients on v3.12.xx ?
Can you setup a VM for a test and run FUSE mounts using v5.6 and with v6.x 

Best Regards,
Strahil NikolovOn Apr 21, 2019 17:24, Patrick Rennie <patrickmrennie at
gmail.com> wrote:>
> Hi Strahil,?
>
> Thank you for your reply and your suggestions. I'm not sure which logs
would be most relevant to be checking to diagnose this issue, we have the brick
logs, the cluster mount logs, the shd logs or something else? I have posted a
few that I have seen repeated a few times already. I will continue to post
anything further that I see.?
> I am working on migrating data to some new storage, so this will slowly
free up space, although this is a production cluster and new data is being
uploaded every day, sometimes faster than I can migrate it off. I have several
other similar clusters and none of them have the same problem, one the others is
actually at 98-99% right now (big problem, I know) but still performs perfectly
fine compared to this cluster, I am not sure low space is the root cause here.?
>
> I currently have 13 VMs accessing this cluster, I have checked each one and
all of them use one of the two options below to mount the cluster in fstab
>
> HOSTNAME:/gvAA01? ?/mountpoint? ? glusterfs? ? ?
?defaults,_netdev,rw,log-level=WARNING,direct-io-mode=disable,use-readdirp=no? ?
0 0
> HOSTNAME:/gvAA01? ?/mountpoint? ? glusterfs? ? ?
?defaults,_netdev,rw,log-level=WARNING,direct-io-mode=disable
>
> I also have a few other VMs which use NFS to access the cluster, and these
machines appear to be significantly quicker, initially I get a similar delay
with NFS but if I cancel the first "ls" and try it again I get < 1
sec lookups, this can take over 10 minutes by FUSE/gluster client, but the same
trick of cancelling and trying again doesn't work for FUSE/gluster.
Sometimes the NFS queries have no delay at all, so this is a bit strange to me.?
> HOSTNAME:/gvAA01? ? ? ? /mountpoint/ nfs
defaults,_netdev,vers=3,async,noatime 0 0
>
> Example:
> user at VM:~$ time ls /cluster/folder
> ^C
>
> real? ? 9m49.383s
> user? ? 0m0.001s
> sys? ? ?0m0.010s
>
> user at VM:~$ time ls /cluster/folder
> <results>
>
> real? ? 0m0.069s
> user? ? 0m0.001s
> sys? ? ?0m0.007s
>
> ---
>
> I have checked the profiling as you suggested, I let it run for around a
minute, then cancelled it and saved the profile info.?
>
> root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 start
> Starting volume profile on gvAA01 has been successful
> root at HOSTNAME:/var/log/glusterfs# time ls /cluster/folder
> ^C
>
> real? ? 1m1.660s
> user? ? 0m0.000s
> sys? ? ?0m0.002s
>
> root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 info
>> ~/profile.txt
> root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 stop
>
> I will attach the results to this email as it's over 1000 lines.
Unfortunately, I'm not sure what I'm looking at but possibly somebody
will be able to help me make sense of it and let me know if it highlights any
specific issues.?
>
> Happy to try any further suggestions. Thank you,
>
> -Patrick
>
> On Sun, Apr 21, 2019 at 7:55 PM Strahil <hunter86_bg at yahoo.com>
wrote:
>>
>> By the way, can you provide the 'volume info' and the mount
options on all clients?
>> Maybe , there is an option that uses a lot of resources due to some
client's mount options.
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> On Apr 21, 2019 10:55, Patrick Rennie <patrickmrennie at
gmail.com> wrote:
>>>
>>> Just another small update, I'm continuing to watch my brick
logs and I just saw these errors come up in the recent events too. I am going to
continue to post any errors I see in the hope of finding the right one to try
and fix..?
>>> This is from the logs on brick1-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190421/36e0ac98/attachment.html>

Patrick Rennie

2019-Apr-21 16:24 UTC

head link

[Gluster-users] Extremely slow cluster performance

Hi Strahil,

Thanks again for your help, I checked most of my clients are on 3.13.2
which I think is the default packaged with Ubuntu.
I upgraded a test VM to v5.6 and tested again and there is no difference,
performance accessing the cluster is the same.

Cheers,
-Patrick

On Sun, Apr 21, 2019 at 11:39 PM Strahil <hunter86_bg at yahoo.com> wrote:
> This looks more like FUSE problem.
> Are the clients on v3.12.xx ?
> Can you setup a VM for a test and run FUSE mounts using v5.6 and with v6.x
>
> Best Regards,
> Strahil Nikolov
> On Apr 21, 2019 17:24, Patrick Rennie <patrickmrennie at gmail.com>
wrote:
>
> Hi Strahil,
>
> Thank you for your reply and your suggestions. I'm not sure which logs
> would be most relevant to be checking to diagnose this issue, we have the
> brick logs, the cluster mount logs, the shd logs or something else? I have
> posted a few that I have seen repeated a few times already. I will continue
> to post anything further that I see.
> I am working on migrating data to some new storage, so this will slowly
> free up space, although this is a production cluster and new data is being
> uploaded every day, sometimes faster than I can migrate it off. I have
> several other similar clusters and none of them have the same problem, one
> the others is actually at 98-99% right now (big problem, I know) but still
> performs perfectly fine compared to this cluster, I am not sure low space
> is the root cause here.
>
> I currently have 13 VMs accessing this cluster, I have checked each one
> and all of them use one of the two options below to mount the cluster in
> fstab
>
> HOSTNAME:/gvAA01   /mountpoint    glusterfs
> 
defaults,_netdev,rw,log-level=WARNING,direct-io-mode=disable,use-readdirp=no
>   0 0
> HOSTNAME:/gvAA01   /mountpoint    glusterfs
>  defaults,_netdev,rw,log-level=WARNING,direct-io-mode=disable
>
> I also have a few other VMs which use NFS to access the cluster, and these
> machines appear to be significantly quicker, initially I get a similar
> delay with NFS but if I cancel the first "ls" and try it again I
get < 1
> sec lookups, this can take over 10 minutes by FUSE/gluster client, but the
> same trick of cancelling and trying again doesn't work for
FUSE/gluster.
> Sometimes the NFS queries have no delay at all, so this is a bit strange to
> me.
> HOSTNAME:/gvAA01        /mountpoint/ nfs
> defaults,_netdev,vers=3,async,noatime 0 0
>
> Example:
> user at VM:~$ time ls /cluster/folder
> ^C
>
> real    9m49.383s
> user    0m0.001s
> sys     0m0.010s
>
> user at VM:~$ time ls /cluster/folder
> <results>
>
> real    0m0.069s
> user    0m0.001s
> sys     0m0.007s
>
> ---
>
> I have checked the profiling as you suggested, I let it run for around a
> minute, then cancelled it and saved the profile info.
>
> root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 start
> Starting volume profile on gvAA01 has been successful
> root at HOSTNAME:/var/log/glusterfs# time ls /cluster/folder
> ^C
>
> real    1m1.660s
> user    0m0.000s
> sys     0m0.002s
>
> root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 info
>>
> ~/profile.txt
> root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 stop
>
> I will attach the results to this email as it's over 1000 lines.
> Unfortunately, I'm not sure what I'm looking at but possibly
somebody will
> be able to help me make sense of it and let me know if it highlights any
> specific issues.
>
> Happy to try any further suggestions. Thank you,
>
> -Patrick
>
> On Sun, Apr 21, 2019 at 7:55 PM Strahil <hunter86_bg at yahoo.com>
wrote:
>
> By the way, can you provide the 'volume info' and the mount options
on all
> clients?
> Maybe , there is an option that uses a lot of resources due to some
> client's mount options.
>
> Best Regards,
> Strahil Nikolov
> On Apr 21, 2019 10:55, Patrick Rennie <patrickmrennie at gmail.com>
wrote:
>
> Just another small update, I'm continuing to watch my brick logs and I
> just saw these errors come up in the recent events too. I am going to
> continue to post any errors I see in the hope of finding the right one to
> try and fix..
> This is from the logs on brick1
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190422/dc1c5432/attachment.html>

Gluster users - Apr 2019 - Extremely slow cluster performance

[Gluster-users] Extremely slow cluster performance

[Gluster-users] Extremely slow cluster performance