Hu Bert
2018-Jul-27 08:02 UTC
[Gluster-users] Gluter 3.12.12: performance during heal and in general
2018-07-27 9:22 GMT+02:00 Pranith Kumar Karampuri <pkarampu at redhat.com>:> > > On Fri, Jul 27, 2018 at 12:36 PM, Hu Bert <revirii at googlemail.com> wrote: >> >> 2018-07-27 8:52 GMT+02:00 Pranith Kumar Karampuri <pkarampu at redhat.com>: >> > >> > >> > On Fri, Jul 27, 2018 at 11:53 AM, Hu Bert <revirii at googlemail.com> >> > wrote: >> >> >> >> > Do you already have all the 190000 directories already created? If >> >> > not >> >> > could you find out which of the paths need it and do a stat directly >> >> > instead >> >> > of find? >> >> >> >> Quite probable not all of them have been created (but counting how >> >> much would take very long...). Hm, maybe running stat in a double loop >> >> (thx to our directory structure) would help. Something like this (may >> >> be not 100% correct): >> >> >> >> for a in ${100..999}; do >> >> for b in ${100..999}; do >> >> stat /$a/$b/ >> >> done >> >> done >> >> >> >> Should run stat on all directories. I think i'll give this a try. >> > >> > >> > Just to prevent these served from a cache, it is probably better to do >> > this >> > from a fresh mount? >> > >> > -- >> > Pranith >> >> Good idea. I'll install glusterfs client on a little used machine, so >> there should be no caching. Thx! Have a good weekend when the time >> comes :-) > > > If this proves effective, what you need to also do is unmount and mount > again, something like: > > mount > for a in ${100..999}; do > for b in ${100..999}; do > stat /$a/$b/ > done > done > unmountI'll see what is possible over the weekend. Btw.: i've seen in the munin stats that the disk utilization for bricksdd1 on the healthy gluster servers is between 70% (night) and almost 99% (daytime). So it looks like that the basic problem is the disk which seems not to be able to work faster? If so (heal) performance won't improve with this setup, i assume. Maybe switching to RAID10 (conventional hard disks), SSDs or even add 3 additional gluster servers (distributed replicated) could help?
Pranith Kumar Karampuri
2018-Jul-27 08:31 UTC
[Gluster-users] Gluter 3.12.12: performance during heal and in general
On Fri, Jul 27, 2018 at 1:32 PM, Hu Bert <revirii at googlemail.com> wrote:> 2018-07-27 9:22 GMT+02:00 Pranith Kumar Karampuri <pkarampu at redhat.com>: > > > > > > On Fri, Jul 27, 2018 at 12:36 PM, Hu Bert <revirii at googlemail.com> > wrote: > >> > >> 2018-07-27 8:52 GMT+02:00 Pranith Kumar Karampuri <pkarampu at redhat.com > >: > >> > > >> > > >> > On Fri, Jul 27, 2018 at 11:53 AM, Hu Bert <revirii at googlemail.com> > >> > wrote: > >> >> > >> >> > Do you already have all the 190000 directories already created? If > >> >> > not > >> >> > could you find out which of the paths need it and do a stat > directly > >> >> > instead > >> >> > of find? > >> >> > >> >> Quite probable not all of them have been created (but counting how > >> >> much would take very long...). Hm, maybe running stat in a double > loop > >> >> (thx to our directory structure) would help. Something like this (may > >> >> be not 100% correct): > >> >> > >> >> for a in ${100..999}; do > >> >> for b in ${100..999}; do > >> >> stat /$a/$b/ > >> >> done > >> >> done > >> >> > >> >> Should run stat on all directories. I think i'll give this a try. > >> > > >> > > >> > Just to prevent these served from a cache, it is probably better to do > >> > this > >> > from a fresh mount? > >> > > >> > -- > >> > Pranith > >> > >> Good idea. I'll install glusterfs client on a little used machine, so > >> there should be no caching. Thx! Have a good weekend when the time > >> comes :-) > > > > > > If this proves effective, what you need to also do is unmount and mount > > again, something like: > > > > mount > > for a in ${100..999}; do > > for b in ${100..999}; do > > stat /$a/$b/ > > done > > done > > unmount > > I'll see what is possible over the weekend. > > Btw.: i've seen in the munin stats that the disk utilization for > bricksdd1 on the healthy gluster servers is between 70% (night) and > almost 99% (daytime). So it looks like that the basic problem is the > disk which seems not to be able to work faster? If so (heal) > performance won't improve with this setup, i assume.It could be saturating in the day. But if enough self-heals are going on, even in the night it should have been close to 100%.> Maybe switching > to RAID10 (conventional hard disks), SSDs or even add 3 additional > gluster servers (distributed replicated) could help? >It definitely will give better protection against hardware failure. Failure domain will be lesser. -- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180727/cdf2527e/attachment.html>
Hu Bert
2018-Aug-14 07:37 UTC
[Gluster-users] Gluter 3.12.12: performance during heal and in general
Hi there, well, it seems the heal has finally finished. Couldn't see/find any related log message; is there such a message in a specific log file? But i see the same behaviour when the last heal finished: all CPU cores are consumed by brick processes; not only by the formerly failed bricksdd1, but by all 4 brick processes (and their threads). Load goes up to > 100 on the 2 servers with the not-failed brick, and glustershd.log gets filled with a lot of entries. Load on the server with the then failed brick not that high, but still ~60. Is this behaviour normal? Is there some post-heal after a heal has finished? thx in advance :-)