thr3ads.net - Gluster users - [Gluster-users] Gluter 3.12.12: performance during heal and in general [Jul 2018]

If this information is useful, please help other people find it:
Share via:

Hu Bert

2018-Jul-27 08:02 UTC

[Gluster-users] Gluter 3.12.12: performance during heal and in general

2018-07-27 9:22 GMT+02:00 Pranith Kumar Karampuri <pkarampu at
redhat.com>:>
>
> On Fri, Jul 27, 2018 at 12:36 PM, Hu Bert <revirii at googlemail.com>
wrote:
>>
>> 2018-07-27 8:52 GMT+02:00 Pranith Kumar Karampuri <pkarampu at
redhat.com>:
>> >
>> >
>> > On Fri, Jul 27, 2018 at 11:53 AM, Hu Bert <revirii at
googlemail.com>
>> > wrote:
>> >>
>> >> > Do you already have all the 190000 directories already
created? If
>> >> > not
>> >> > could you find out which of the paths need it and do a
stat directly
>> >> > instead
>> >> > of find?
>> >>
>> >> Quite probable not all of them have been created (but counting
how
>> >> much would take very long...). Hm, maybe running stat in a
double loop
>> >> (thx to our directory structure) would help. Something like
this (may
>> >> be not 100% correct):
>> >>
>> >> for a in ${100..999}; do
>> >>     for b in ${100..999}; do
>> >>         stat /$a/$b/
>> >>     done
>> >> done
>> >>
>> >> Should run stat on all directories. I think i'll give this
a try.
>> >
>> >
>> > Just to prevent these served from a cache, it is probably better
to do
>> > this
>> > from a fresh mount?
>> >
>> > --
>> > Pranith
>>
>> Good idea. I'll install glusterfs client on a little used machine,
so
>> there should be no caching. Thx! Have a good weekend when the time
>> comes :-)
>
>
> If this proves effective, what you need to also do is unmount and mount
> again, something like:
>
> mount
> for a in ${100..999}; do
>      for b in ${100..999}; do
>          stat /$a/$b/
>      done
>   done
> unmount
I'll see what is possible over the weekend.

Btw.: i've seen in the munin stats that the disk utilization for
bricksdd1 on the healthy gluster servers is between 70% (night) and
almost 99% (daytime). So it looks like that the basic problem is the
disk which seems not to be able to work faster? If so (heal)
performance won't improve with this setup, i assume. Maybe switching
to RAID10 (conventional hard disks), SSDs or even add 3 additional
gluster servers (distributed replicated) could help?

Pranith Kumar Karampuri

2018-Jul-27 08:31 UTC

head link

[Gluster-users] Gluter 3.12.12: performance during heal and in general

On Fri, Jul 27, 2018 at 1:32 PM, Hu Bert <revirii at googlemail.com>
wrote:
> 2018-07-27 9:22 GMT+02:00 Pranith Kumar Karampuri <pkarampu at
redhat.com>:
> >
> >
> > On Fri, Jul 27, 2018 at 12:36 PM, Hu Bert <revirii at
googlemail.com>
> wrote:
> >>
> >> 2018-07-27 8:52 GMT+02:00 Pranith Kumar Karampuri <pkarampu at
redhat.com
> >:
> >> >
> >> >
> >> > On Fri, Jul 27, 2018 at 11:53 AM, Hu Bert <revirii at
googlemail.com>
> >> > wrote:
> >> >>
> >> >> > Do you already have all the 190000 directories
already created? If
> >> >> > not
> >> >> > could you find out which of the paths need it and do
a stat
> directly
> >> >> > instead
> >> >> > of find?
> >> >>
> >> >> Quite probable not all of them have been created (but
counting how
> >> >> much would take very long...). Hm, maybe running stat in
a double
> loop
> >> >> (thx to our directory structure) would help. Something
like this (may
> >> >> be not 100% correct):
> >> >>
> >> >> for a in ${100..999}; do
> >> >>     for b in ${100..999}; do
> >> >>         stat /$a/$b/
> >> >>     done
> >> >> done
> >> >>
> >> >> Should run stat on all directories. I think i'll give
this a try.
> >> >
> >> >
> >> > Just to prevent these served from a cache, it is probably
better to do
> >> > this
> >> > from a fresh mount?
> >> >
> >> > --
> >> > Pranith
> >>
> >> Good idea. I'll install glusterfs client on a little used
machine, so
> >> there should be no caching. Thx! Have a good weekend when the time
> >> comes :-)
> >
> >
> > If this proves effective, what you need to also do is unmount and
mount
> > again, something like:
> >
> > mount
> > for a in ${100..999}; do
> >      for b in ${100..999}; do
> >          stat /$a/$b/
> >      done
> >   done
> > unmount
>
> I'll see what is possible over the weekend.
>
> Btw.: i've seen in the munin stats that the disk utilization for
> bricksdd1 on the healthy gluster servers is between 70% (night) and
> almost 99% (daytime). So it looks like that the basic problem is the
> disk which seems not to be able to work faster? If so (heal)
> performance won't improve with this setup, i assume.

It could be saturating in the day. But if enough self-heals are going on,
even in the night
it should have been close to 100%.


> Maybe switching
> to RAID10 (conventional hard disks), SSDs or even add 3 additional
> gluster servers (distributed replicated) could help?
>
It definitely will give better protection against hardware failure. Failure
domain will be lesser.
-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180727/cdf2527e/attachment.html>

Hu Bert

2018-Aug-14 07:37 UTC

head link

[Gluster-users] Gluter 3.12.12: performance during heal and in general

Hi there,

well, it seems the heal has finally finished. Couldn't see/find any
related log message; is there such a message in a specific log file?

But i see the same behaviour when the last heal finished: all CPU
cores are consumed by brick processes; not only by the formerly failed
bricksdd1, but by all 4 brick processes (and their threads). Load goes
up to > 100 on the 2 servers with the not-failed brick, and
glustershd.log gets filled with a lot of entries. Load on the server
with the then failed brick not that high, but still ~60.

Is this behaviour normal? Is there some post-heal after a heal has finished?

thx in advance :-)

Gluster users - Jul 2018 - Gluter 3.12.12: performance during heal and in general

[Gluster-users] Gluter 3.12.12: performance during heal and in general

[Gluster-users] Gluter 3.12.12: performance during heal and in general

[Gluster-users] Gluter 3.12.12: performance during heal and in general