thr3ads.net - Gluster users - [Gluster-users] performance.cache-size for high-RAM clients/servers, other tweaks for performance, and improvements to Gluster docs [Apr 2018]

If this information is useful, please help other people find it:
Share via:

Artem Russakovskii

2018-Apr-18 06:22 UTC

[Gluster-users] performance.cache-size for high-RAM clients/servers, other tweaks for performance, and improvements to Gluster docs

Thanks for the link. Looking at the status of that doc, it isn't quite
ready yet, and there's no mention of the option.

Does it mean that whatever is ready now in 4.0.1 is incomplete but can be
enabled via granular-entry-heal=on, and when it is complete, it'll become
the default and the flag will simply go away?

Is there any risk enabling the option now in 4.0.1?


Sincerely,
Artem

--
Founder, Android Police <http://www.androidpolice.com>, APK Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
<http://twitter.com/ArtemR>

On Tue, Apr 17, 2018 at 11:16 PM, Ravishankar N <ravishankar at
redhat.com>
wrote:
>
>
> On 04/18/2018 10:35 AM, Artem Russakovskii wrote:
>
> Hi Ravi,
>
> Could you please expand on how these would help?
>
> By forcing full here, we move the logic from the CPU to network, thus
> decreasing CPU utilization, is that right?
>
> Yes, 'diff' employs the rchecksum FOP which does a sha256  checksum
which
> can consume CPU. So yes it is sort of shifting the load from CPU to the
> network. But if your average file size is small, it would make sense to
> copy the entire file instead of computing checksums.
>
> This is assuming the CPU and disk utilization are caused by the differ and
> not by lstat and other calls or something.
>
>> Option: cluster.data-self-heal-algorithm
>> Default Value: (null)
>> Description: Select between "full", "diff". The
"full" algorithm copies
>> the entire file from source to sink. The "diff" algorithm
copies to sink
>> only those blocks whose checksums don't match with those of source.
If no
>> option is configured the option is chosen dynamically as follows: If
the
>> file does not exist on one of the sinks or empty file exists or if the
>> source file size is about the same as page size the entire file will be
>> read and written i.e "full" algo, otherwise "diff"
algo is chosen.
>
>
> I really have no idea what this means and how/why it would help. Any more
> info on this option?
>
>
> https://github.com/gluster/glusterfs-specs/blob/master/
> done/GlusterFS%203.8/granular-entry-self-healing.md should help.
> Regards,
> Ravi
>
>
> Option: cluster.granular-entry-heal
>> Default Value: no
>> Description: If this option is enabled, self-heal will resort to
granular
>> way of recording changelogs and doing entry self-heal.
>
>
> Thank you.
>
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
> <http://www.apkmirror.com/>, Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
> <http://twitter.com/ArtemR>
>
> On Tue, Apr 17, 2018 at 9:58 PM, Ravishankar N <ravishankar at
redhat.com>
> wrote:
>
>>
>> On 04/18/2018 10:14 AM, Artem Russakovskii wrote:
>>
>> Following up here on a related and very serious for us issue.
>>
>> I took down one of the 4 replicate gluster servers for maintenance
today.
>> There are 2 gluster volumes totaling about 600GB. Not that much data.
After
>> the server comes back online, it starts auto healing and pretty much
all
>> operations on gluster freeze for many minutes.
>>
>> For example, I was trying to run an ls -alrt in a folder with 7300
files,
>> and it took a good 15-20 minutes before returning.
>>
>> During this time, I can see iostat show 100% utilization on the brick,
>> heal status takes many minutes to return, glusterfsd uses up tons of
CPU (I
>> saw it spike to 600%). gluster already has massive performance issues
for
>> me, but healing after a 4-hour downtime is on another level of bad
perf.
>>
>> For example, this command took many minutes to run:
>>
>> gluster volume heal androidpolice_data3 info summary
>> Brick nexus2:/mnt/nexus2_block4/androidpolice_data3
>> Status: Connected
>> Total Number of entries: 91
>> Number of entries in heal pending: 90
>> Number of entries in split-brain: 0
>> Number of entries possibly healing: 1
>>
>> Brick forge:/mnt/forge_block4/androidpolice_data3
>> Status: Connected
>> Total Number of entries: 87
>> Number of entries in heal pending: 86
>> Number of entries in split-brain: 0
>> Number of entries possibly healing: 1
>>
>> Brick hive:/mnt/hive_block4/androidpolice_data3
>> Status: Connected
>> Total Number of entries: 87
>> Number of entries in heal pending: 86
>> Number of entries in split-brain: 0
>> Number of entries possibly healing: 1
>>
>> Brick citadel:/mnt/citadel_block4/androidpolice_data3
>> Status: Connected
>> Total Number of entries: 0
>> Number of entries in heal pending: 0
>> Number of entries in split-brain: 0
>> Number of entries possibly healing: 0
>>
>>
>> Statistics showed a diminishing number of failed heals:
>> ...
>> Ending time of crawl: Tue Apr 17 21:13:08 2018
>>
>> Type of crawl: INDEX
>> No. of entries healed: 2
>> No. of entries in split-brain: 0
>> No. of heal failed entries: 102
>>
>> Starting time of crawl: Tue Apr 17 21:13:09 2018
>>
>> Ending time of crawl: Tue Apr 17 21:14:30 2018
>>
>> Type of crawl: INDEX
>> No. of entries healed: 4
>> No. of entries in split-brain: 0
>> No. of heal failed entries: 91
>>
>> Starting time of crawl: Tue Apr 17 21:14:31 2018
>>
>> Ending time of crawl: Tue Apr 17 21:15:34 2018
>>
>> Type of crawl: INDEX
>> No. of entries healed: 0
>> No. of entries in split-brain: 0
>> No. of heal failed entries: 88
>> ...
>>
>> Eventually, everything heals and goes back to at least where the roof
>> isn't on fire anymore.
>>
>> The server stats and volume options were given in one of the previous
>> replies to this thread.
>>
>> Any ideas or things I could run and show the output of to help
diagnose?
>> I'm also very open to working with someone on the team on a live
debugging
>> session if there's interest.
>>
>>
>> It is likely that self-heal is causing the CPU spike due to the flood
of
>> lookups/ locks and checksum fops that the self-heal-daemon sends to the
>> bricks.
>> There's a script to control shd's cpu usage using cgroups. That
should
>> help in regulating self-heal traffic: https://review.gluster.org/#/c
>> /18404/ (see extras/control-cpu-load.sh)
>> Other self-heal related volume options that you could change are
setting
>> 'cluster.data-self-heal-algorithm' to 'full' and
'granular-entry-heal'
>> to 'enable'.  `gluster volume set help` should give you more
information
>> about these options.
>> Thanks,
>> Ravi
>>
>>
>>
>> Thank you.
>>
>>
>> Sincerely,
>> Artem
>>
>> --
>> Founder, Android Police <http://www.androidpolice.com>, APK
Mirror
>> <http://www.apkmirror.com/>, Illogical Robot LLC
>> beerpla.net | +ArtemRussakovskii
>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>> <http://twitter.com/ArtemR>
>>
>> On Tue, Apr 10, 2018 at 9:56 AM, Artem Russakovskii <archon810 at
gmail.com>
>> wrote:
>>
>>> Hi Vlad,
>>>
>>> I actually saw that post already and even asked a question 4 days
ago (
>>> https://serverfault.com/questions/517775/glusterfs-direct-i
>>> -o-mode#comment1172497_540917). The accepted answer also seems to
go
>>> against your suggestion to enable direct-io-mode as it says it
should be
>>> disabled for better performance when used just for file accesses.
>>>
>>> It'd be great if someone from the Gluster team chimed in about
this
>>> thread.
>>>
>>>
>>> Sincerely,
>>> Artem
>>>
>>> --
>>> Founder, Android Police <http://www.androidpolice.com>, APK
Mirror
>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>> beerpla.net | +ArtemRussakovskii
>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>> <http://twitter.com/ArtemR>
>>>
>>> On Tue, Apr 10, 2018 at 7:01 AM, Vlad Kopylov <vladkopy at
gmail.com>
>>> wrote:
>>>
>>>> Wish I knew or was able to get detailed description of those
options
>>>> myself.
>>>> here is direct-io-mode  https://serverfault.com/questi
>>>> ons/517775/glusterfs-direct-i-o-mode
>>>> Same as you I ran tests on a large volume of files, finding
that main
>>>> delays are in attribute calls, ending up with those mount
options to add
>>>> performance.
>>>> I discovered those options through basically googling this user
list
>>>> with people sharing their tests.
>>>> Not sure I would share your optimism, and rather then going up
I
>>>> downgraded to 3.12 and have no dir view issue now. Though I had
to recreate
>>>> the cluster and had to re-add bricks with existing data.
>>>>
>>>> On Tue, Apr 10, 2018 at 1:47 AM, Artem Russakovskii <
>>>> archon810 at gmail.com> wrote:
>>>>
>>>>> Hi Vlad,
>>>>>
>>>>> I'm using only localhost: mounts.
>>>>>
>>>>> Can you please explain what effect each option has on
performance
>>>>> issues shown in my posts?
"negative-timeout=10,attribute
>>>>>
-timeout=30,fopen-keep-cache,direct-io-mode=enable,fetch-attempts=5"
>>>>> From what I remember, direct-io-mode=enable didn't make
a difference in my
>>>>> tests, but I suppose I can try again. The explanations
about direct-io-mode
>>>>> are quite confusing on the web in various guides, saying
enabling it could
>>>>> make performance worse in some situations and better in
others due to OS
>>>>> file cache.
>>>>>
>>>>> There are also these gluster volume settings, adding to the
confusion:
>>>>> Option: performance.strict-o-direct
>>>>> Default Value: off
>>>>> Description: This option when set to off, ignores the
O_DIRECT flag.
>>>>>
>>>>> Option: performance.nfs.strict-o-direct
>>>>> Default Value: off
>>>>> Description: This option when set to off, ignores the
O_DIRECT flag.
>>>>>
>>>>> Re: 4.0. I moved to 4.0 after finding out that it fixes the
>>>>> disappearing dirs bug related to cluster.readdir-optimize
if you remember (
>>>>> http://lists.gluster.org/pipermail/gluster-users/2018-April
>>>>> /033830.html). I was already on 3.13 by then, and 4.0
resolved the
>>>>> issue. It's been stable for me so far, thankfully.
>>>>>
>>>>>
>>>>> Sincerely,
>>>>> Artem
>>>>>
>>>>> --
>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>> beerpla.net | +ArtemRussakovskii
>>>>> <https://plus.google.com/+ArtemRussakovskii> |
@ArtemR
>>>>> <http://twitter.com/ArtemR>
>>>>>
>>>>> On Mon, Apr 9, 2018 at 10:38 PM, Vlad Kopylov <vladkopy
at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> you definitely need mount options to /etc/fstab
>>>>>> use ones from here http://lists.gluster.org/piper
>>>>>> mail/gluster-users/2018-April/033811.html
>>>>>>
>>>>>> I went on with using local mounts to achieve
performance as well
>>>>>>
>>>>>> Also, 3.12 or 3.10 branches would be preferable for
production
>>>>>>
>>>>>> On Fri, Apr 6, 2018 at 4:12 AM, Artem Russakovskii <
>>>>>> archon810 at gmail.com> wrote:
>>>>>>
>>>>>>> Hi again,
>>>>>>>
>>>>>>> I'd like to expand on the performance issues
and plead for help.
>>>>>>> Here's one case which shows these odd hiccups:
https://i.imgur.com/C
>>>>>>> XBPjTK.gifv.
>>>>>>>
>>>>>>> In this GIF where I switch back and forth between
copy operations
>>>>>>> on 2 servers, I'm copying a 10GB dir full of
.apk and image files.
>>>>>>>
>>>>>>> On server "hive" I'm copying straight
from the main disk to an
>>>>>>> attached volume block (xfs). As you can see, the
transfers are relatively
>>>>>>> speedy and don't hiccup.
>>>>>>> On server "citadel" I'm copying the
same set of data to a
>>>>>>> 4-replicate gluster which uses block storage as a
brick. As you can see,
>>>>>>> performance is much worse, and there are frequent
pauses for many seconds
>>>>>>> where nothing seems to be happening - just freezes.
>>>>>>>
>>>>>>> All 4 servers have the same specs, and all of them
have performance
>>>>>>> issues with gluster and no such issues when raw xfs
block storage is used.
>>>>>>>
>>>>>>> hive has long finished copying the data, while
citadel is barely
>>>>>>> chugging along and is expected to take probably
half an hour to an hour. I
>>>>>>> have over 1TB of data to migrate, at which point if
we went live, I'm not
>>>>>>> even sure gluster would be able to keep up instead
of bringing the machines
>>>>>>> and services down.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Here's the cluster config, though it didn't
seem to make any
>>>>>>> difference performance-wise before I applied the
customizations vs after.
>>>>>>>
>>>>>>> Volume Name: apkmirror_data1
>>>>>>> Type: Replicate
>>>>>>> Volume ID: 11ecee7e-d4f8-497a-9994-ceb144d6841e
>>>>>>> Status: Started
>>>>>>> Snapshot Count: 0
>>>>>>> Number of Bricks: 1 x 4 = 4
>>>>>>> Transport-type: tcp
>>>>>>> Bricks:
>>>>>>> Brick1: nexus2:/mnt/nexus2_block1/apkmirror_data1
>>>>>>> Brick2: forge:/mnt/forge_block1/apkmirror_data1
>>>>>>> Brick3: hive:/mnt/hive_block1/apkmirror_data1
>>>>>>> Brick4: citadel:/mnt/citadel_block1/apkmirror_data1
>>>>>>> Options Reconfigured:
>>>>>>> cluster.quorum-count: 1
>>>>>>> cluster.quorum-type: fixed
>>>>>>> network.ping-timeout: 5
>>>>>>> network.remote-dio: enable
>>>>>>> performance.rda-cache-limit: 256MB
>>>>>>> performance.readdir-ahead: on
>>>>>>> performance.parallel-readdir: on
>>>>>>> network.inode-lru-limit: 500000
>>>>>>> performance.md-cache-timeout: 600
>>>>>>> performance.cache-invalidation: on
>>>>>>> performance.stat-prefetch: on
>>>>>>> features.cache-invalidation-timeout: 600
>>>>>>> features.cache-invalidation: on
>>>>>>> cluster.readdir-optimize: on
>>>>>>> performance.io-thread-count: 32
>>>>>>> server.event-threads: 4
>>>>>>> client.event-threads: 4
>>>>>>> performance.read-ahead: off
>>>>>>> cluster.lookup-optimize: on
>>>>>>> performance.cache-size: 1GB
>>>>>>> cluster.self-heal-daemon: enable
>>>>>>> transport.address-family: inet
>>>>>>> nfs.disable: on
>>>>>>> performance.client-io-threads: on
>>>>>>>
>>>>>>>
>>>>>>> The mounts are done as follows in /etc/fstab:
>>>>>>> /dev/disk/by-id/scsi-0Linode_Volume_citadel_block1
>>>>>>> /mnt/citadel_block1 xfs defaults 0 2
>>>>>>> localhost:/apkmirror_data1 /mnt/apkmirror_data1
glusterfs
>>>>>>> defaults,_netdev 0 0
>>>>>>>
>>>>>>> I'm really not sure if direct-io-mode mount
tweaks would do anything
>>>>>>> here, what the value should be set to, and what it
is by default.
>>>>>>>
>>>>>>> The OS is OpenSUSE 42.3, 64-bit. 80GB of RAM, 20
CPUs, hosted by
>>>>>>> Linode.
>>>>>>>
>>>>>>> I'd really appreciate any help in the matter.
>>>>>>>
>>>>>>> Thank you.
>>>>>>>
>>>>>>>
>>>>>>> Sincerely,
>>>>>>> Artem
>>>>>>>
>>>>>>> --
>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>>> <http://www.apkmirror.com/>, Illogical Robot
LLC
>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>> <https://plus.google.com/+ArtemRussakovskii>
| @ArtemR
>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>
>>>>>>> On Thu, Apr 5, 2018 at 11:13 PM, Artem Russakovskii
<
>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I'm trying to squeeze performance out of
gluster on 4 80GB RAM
>>>>>>>> 20-CPU machines where Gluster runs on attached
block storage (Linode) in (4
>>>>>>>> replicate bricks), and so far everything I
tried results in sub-optimal
>>>>>>>> performance.
>>>>>>>>
>>>>>>>> There are many files - mostly images, several
million - and many
>>>>>>>> operations take minutes, copying multiple files
(even if they're small)
>>>>>>>> suddenly freezes up for seconds at a time, then
continues, iostat
>>>>>>>> frequently shows large r_await and w_awaits
with 100% utilization for the
>>>>>>>> attached block device, etc.
>>>>>>>>
>>>>>>>> But anyway, there are many guides out there for
small-file
>>>>>>>> performance improvements, but more explanation
is needed, and I think more
>>>>>>>> tweaks should be possible.
>>>>>>>>
>>>>>>>> My question today is about
performance.cache-size. Is this a size
>>>>>>>> of cache in RAM? If so, how do I view the
current cache size to see if it
>>>>>>>> gets full and I should increase its size? Is it
advisable to bump it up if
>>>>>>>> I have many tens of gigs of RAM free?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> More generally, in the last 2 months since I
first started working
>>>>>>>> with gluster and set a production system live,
I've been feeling frustrated
>>>>>>>> because Gluster has a lot of poorly-documented
and confusing options. I
>>>>>>>> really wish documentation could be improved
with examples and better
>>>>>>>> explanations.
>>>>>>>>
>>>>>>>> Specifically, it'd be absolutely amazing if
the docs offered a
>>>>>>>> strategy for setting each value and ways of
determining more optimal
>>>>>>>> values. For example, for
performance.cache-size, if it said something like
>>>>>>>> "run command abc to see your current cache
size, and if it's hurting, up
>>>>>>>> it, but be aware that it's limited by
RAM," it'd be already a huge
>>>>>>>> improvement to the docs. And so on with other
options.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The gluster team is quite helpful on this
mailing list, but in a
>>>>>>>> reactive rather than proactive way. Perhaps
it's tunnel vision once you've
>>>>>>>> worked on a project for so long where less
technical explanations and even
>>>>>>>> proper documentation of options takes a back
seat, but I encourage you to
>>>>>>>> be more proactive about helping us understand
and optimize Gluster.
>>>>>>>>
>>>>>>>> Thank you.
>>>>>>>>
>>>>>>>> Sincerely,
>>>>>>>> Artem
>>>>>>>>
>>>>>>>> --
>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>>>> <http://www.apkmirror.com/>, Illogical
Robot LLC
>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>>
http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing listGluster-users at
gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180417/dbce2413/attachment-0001.html>

Artem Russakovskii

2018-Apr-18 06:29 UTC

head link

[Gluster-users] performance.cache-size for high-RAM clients/servers, other tweaks for performance, and improvements to Gluster docs

Btw, I've now noticed at least 5 variations in toggling binary option
values. Are they all interchangeable, or will using the wrong value not
work in some cases?

yes/no
true/false
True/False
on/off
enable/disable

It's quite a confusing/inconsistent practice, especially given that many
options will accept any value without erroring out/validation.


Sincerely,
Artem

--
Founder, Android Police <http://www.androidpolice.com>, APK Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
<http://twitter.com/ArtemR>

On Tue, Apr 17, 2018 at 11:22 PM, Artem Russakovskii <archon810 at
gmail.com>
wrote:
> Thanks for the link. Looking at the status of that doc, it isn't quite
> ready yet, and there's no mention of the option.
>
> Does it mean that whatever is ready now in 4.0.1 is incomplete but can be
> enabled via granular-entry-heal=on, and when it is complete, it'll
become
> the default and the flag will simply go away?
>
> Is there any risk enabling the option now in 4.0.1?
>
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
> <http://www.apkmirror.com/>, Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
> <http://twitter.com/ArtemR>
>
> On Tue, Apr 17, 2018 at 11:16 PM, Ravishankar N <ravishankar at
redhat.com>
> wrote:
>
>>
>>
>> On 04/18/2018 10:35 AM, Artem Russakovskii wrote:
>>
>> Hi Ravi,
>>
>> Could you please expand on how these would help?
>>
>> By forcing full here, we move the logic from the CPU to network, thus
>> decreasing CPU utilization, is that right?
>>
>> Yes, 'diff' employs the rchecksum FOP which does a sha256 
checksum which
>> can consume CPU. So yes it is sort of shifting the load from CPU to the
>> network. But if your average file size is small, it would make sense to
>> copy the entire file instead of computing checksums.
>>
>> This is assuming the CPU and disk utilization are caused by the differ
>> and not by lstat and other calls or something.
>>
>>> Option: cluster.data-self-heal-algorithm
>>> Default Value: (null)
>>> Description: Select between "full", "diff". The
"full" algorithm copies
>>> the entire file from source to sink. The "diff" algorithm
copies to sink
>>> only those blocks whose checksums don't match with those of
source. If no
>>> option is configured the option is chosen dynamically as follows:
If the
>>> file does not exist on one of the sinks or empty file exists or if
the
>>> source file size is about the same as page size the entire file
will be
>>> read and written i.e "full" algo, otherwise
"diff" algo is chosen.
>>
>>
>> I really have no idea what this means and how/why it would help. Any
more
>> info on this option?
>>
>>
>> https://github.com/gluster/glusterfs-specs/blob/master/done/
>> GlusterFS%203.8/granular-entry-self-healing.md should help.
>> Regards,
>> Ravi
>>
>>
>> Option: cluster.granular-entry-heal
>>> Default Value: no
>>> Description: If this option is enabled, self-heal will resort to
>>> granular way of recording changelogs and doing entry self-heal.
>>
>>
>> Thank you.
>>
>>
>> Sincerely,
>> Artem
>>
>> --
>> Founder, Android Police <http://www.androidpolice.com>, APK
Mirror
>> <http://www.apkmirror.com/>, Illogical Robot LLC
>> beerpla.net | +ArtemRussakovskii
>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>> <http://twitter.com/ArtemR>
>>
>> On Tue, Apr 17, 2018 at 9:58 PM, Ravishankar N <ravishankar at
redhat.com>
>> wrote:
>>
>>>
>>> On 04/18/2018 10:14 AM, Artem Russakovskii wrote:
>>>
>>> Following up here on a related and very serious for us issue.
>>>
>>> I took down one of the 4 replicate gluster servers for maintenance
>>> today. There are 2 gluster volumes totaling about 600GB. Not that
much
>>> data. After the server comes back online, it starts auto healing
and pretty
>>> much all operations on gluster freeze for many minutes.
>>>
>>> For example, I was trying to run an ls -alrt in a folder with 7300
>>> files, and it took a good 15-20 minutes before returning.
>>>
>>> During this time, I can see iostat show 100% utilization on the
brick,
>>> heal status takes many minutes to return, glusterfsd uses up tons
of CPU (I
>>> saw it spike to 600%). gluster already has massive performance
issues for
>>> me, but healing after a 4-hour downtime is on another level of bad
perf.
>>>
>>> For example, this command took many minutes to run:
>>>
>>> gluster volume heal androidpolice_data3 info summary
>>> Brick nexus2:/mnt/nexus2_block4/androidpolice_data3
>>> Status: Connected
>>> Total Number of entries: 91
>>> Number of entries in heal pending: 90
>>> Number of entries in split-brain: 0
>>> Number of entries possibly healing: 1
>>>
>>> Brick forge:/mnt/forge_block4/androidpolice_data3
>>> Status: Connected
>>> Total Number of entries: 87
>>> Number of entries in heal pending: 86
>>> Number of entries in split-brain: 0
>>> Number of entries possibly healing: 1
>>>
>>> Brick hive:/mnt/hive_block4/androidpolice_data3
>>> Status: Connected
>>> Total Number of entries: 87
>>> Number of entries in heal pending: 86
>>> Number of entries in split-brain: 0
>>> Number of entries possibly healing: 1
>>>
>>> Brick citadel:/mnt/citadel_block4/androidpolice_data3
>>> Status: Connected
>>> Total Number of entries: 0
>>> Number of entries in heal pending: 0
>>> Number of entries in split-brain: 0
>>> Number of entries possibly healing: 0
>>>
>>>
>>> Statistics showed a diminishing number of failed heals:
>>> ...
>>> Ending time of crawl: Tue Apr 17 21:13:08 2018
>>>
>>> Type of crawl: INDEX
>>> No. of entries healed: 2
>>> No. of entries in split-brain: 0
>>> No. of heal failed entries: 102
>>>
>>> Starting time of crawl: Tue Apr 17 21:13:09 2018
>>>
>>> Ending time of crawl: Tue Apr 17 21:14:30 2018
>>>
>>> Type of crawl: INDEX
>>> No. of entries healed: 4
>>> No. of entries in split-brain: 0
>>> No. of heal failed entries: 91
>>>
>>> Starting time of crawl: Tue Apr 17 21:14:31 2018
>>>
>>> Ending time of crawl: Tue Apr 17 21:15:34 2018
>>>
>>> Type of crawl: INDEX
>>> No. of entries healed: 0
>>> No. of entries in split-brain: 0
>>> No. of heal failed entries: 88
>>> ...
>>>
>>> Eventually, everything heals and goes back to at least where the
roof
>>> isn't on fire anymore.
>>>
>>> The server stats and volume options were given in one of the
previous
>>> replies to this thread.
>>>
>>> Any ideas or things I could run and show the output of to help
diagnose?
>>> I'm also very open to working with someone on the team on a
live debugging
>>> session if there's interest.
>>>
>>>
>>> It is likely that self-heal is causing the CPU spike due to the
flood of
>>> lookups/ locks and checksum fops that the self-heal-daemon sends to
the
>>> bricks.
>>> There's a script to control shd's cpu usage using cgroups.
That should
>>> help in regulating self-heal traffic:
https://review.gluster.org/#/c
>>> /18404/ (see extras/control-cpu-load.sh)
>>> Other self-heal related volume options that you could change are
setting
>>> 'cluster.data-self-heal-algorithm' to 'full' and
'granular-entry-heal'
>>> to 'enable'.  `gluster volume set help` should give you
more information
>>> about these options.
>>> Thanks,
>>> Ravi
>>>
>>>
>>>
>>> Thank you.
>>>
>>>
>>> Sincerely,
>>> Artem
>>>
>>> --
>>> Founder, Android Police <http://www.androidpolice.com>, APK
Mirror
>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>> beerpla.net | +ArtemRussakovskii
>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>> <http://twitter.com/ArtemR>
>>>
>>> On Tue, Apr 10, 2018 at 9:56 AM, Artem Russakovskii <archon810
at gmail.com
>>> > wrote:
>>>
>>>> Hi Vlad,
>>>>
>>>> I actually saw that post already and even asked a question 4
days ago (
>>>> https://serverfault.com/questions/517775/glusterfs-direct-i
>>>> -o-mode#comment1172497_540917). The accepted answer also seems
to go
>>>> against your suggestion to enable direct-io-mode as it says it
should be
>>>> disabled for better performance when used just for file
accesses.
>>>>
>>>> It'd be great if someone from the Gluster team chimed in
about this
>>>> thread.
>>>>
>>>>
>>>> Sincerely,
>>>> Artem
>>>>
>>>> --
>>>> Founder, Android Police <http://www.androidpolice.com>,
APK Mirror
>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>> beerpla.net | +ArtemRussakovskii
>>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>> <http://twitter.com/ArtemR>
>>>>
>>>> On Tue, Apr 10, 2018 at 7:01 AM, Vlad Kopylov <vladkopy at
gmail.com>
>>>> wrote:
>>>>
>>>>> Wish I knew or was able to get detailed description of
those options
>>>>> myself.
>>>>> here is direct-io-mode  https://serverfault.com/questi
>>>>> ons/517775/glusterfs-direct-i-o-mode
>>>>> Same as you I ran tests on a large volume of files, finding
that main
>>>>> delays are in attribute calls, ending up with those mount
options to add
>>>>> performance.
>>>>> I discovered those options through basically googling this
user list
>>>>> with people sharing their tests.
>>>>> Not sure I would share your optimism, and rather then going
up I
>>>>> downgraded to 3.12 and have no dir view issue now. Though I
had to recreate
>>>>> the cluster and had to re-add bricks with existing data.
>>>>>
>>>>> On Tue, Apr 10, 2018 at 1:47 AM, Artem Russakovskii <
>>>>> archon810 at gmail.com> wrote:
>>>>>
>>>>>> Hi Vlad,
>>>>>>
>>>>>> I'm using only localhost: mounts.
>>>>>>
>>>>>> Can you please explain what effect each option has on
performance
>>>>>> issues shown in my posts?
"negative-timeout=10,attribute
>>>>>>
-timeout=30,fopen-keep-cache,direct-io-mode=enable,fetch-attempts=5"
>>>>>> From what I remember, direct-io-mode=enable didn't
make a difference in my
>>>>>> tests, but I suppose I can try again. The explanations
about direct-io-mode
>>>>>> are quite confusing on the web in various guides,
saying enabling it could
>>>>>> make performance worse in some situations and better in
others due to OS
>>>>>> file cache.
>>>>>>
>>>>>> There are also these gluster volume settings, adding to
the confusion:
>>>>>> Option: performance.strict-o-direct
>>>>>> Default Value: off
>>>>>> Description: This option when set to off, ignores the
O_DIRECT flag.
>>>>>>
>>>>>> Option: performance.nfs.strict-o-direct
>>>>>> Default Value: off
>>>>>> Description: This option when set to off, ignores the
O_DIRECT flag.
>>>>>>
>>>>>> Re: 4.0. I moved to 4.0 after finding out that it fixes
the
>>>>>> disappearing dirs bug related to
cluster.readdir-optimize if you remember (
>>>>>>
http://lists.gluster.org/pipermail/gluster-users/2018-April
>>>>>> /033830.html). I was already on 3.13 by then, and 4.0
resolved the
>>>>>> issue. It's been stable for me so far, thankfully.
>>>>>>
>>>>>>
>>>>>> Sincerely,
>>>>>> Artem
>>>>>>
>>>>>> --
>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>> <https://plus.google.com/+ArtemRussakovskii> |
@ArtemR
>>>>>> <http://twitter.com/ArtemR>
>>>>>>
>>>>>> On Mon, Apr 9, 2018 at 10:38 PM, Vlad Kopylov
<vladkopy at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> you definitely need mount options to /etc/fstab
>>>>>>> use ones from here http://lists.gluster.org/piper
>>>>>>> mail/gluster-users/2018-April/033811.html
>>>>>>>
>>>>>>> I went on with using local mounts to achieve
performance as well
>>>>>>>
>>>>>>> Also, 3.12 or 3.10 branches would be preferable for
production
>>>>>>>
>>>>>>> On Fri, Apr 6, 2018 at 4:12 AM, Artem Russakovskii
<
>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi again,
>>>>>>>>
>>>>>>>> I'd like to expand on the performance
issues and plead for help.
>>>>>>>> Here's one case which shows these odd
hiccups:
>>>>>>>> https://i.imgur.com/CXBPjTK.gifv.
>>>>>>>>
>>>>>>>> In this GIF where I switch back and forth
between copy operations
>>>>>>>> on 2 servers, I'm copying a 10GB dir full
of .apk and image files.
>>>>>>>>
>>>>>>>> On server "hive" I'm copying
straight from the main disk to an
>>>>>>>> attached volume block (xfs). As you can see,
the transfers are relatively
>>>>>>>> speedy and don't hiccup.
>>>>>>>> On server "citadel" I'm copying
the same set of data to a
>>>>>>>> 4-replicate gluster which uses block storage as
a brick. As you can see,
>>>>>>>> performance is much worse, and there are
frequent pauses for many seconds
>>>>>>>> where nothing seems to be happening - just
freezes.
>>>>>>>>
>>>>>>>> All 4 servers have the same specs, and all of
them have performance
>>>>>>>> issues with gluster and no such issues when raw
xfs block storage is used.
>>>>>>>>
>>>>>>>> hive has long finished copying the data, while
citadel is barely
>>>>>>>> chugging along and is expected to take probably
half an hour to an hour. I
>>>>>>>> have over 1TB of data to migrate, at which
point if we went live, I'm not
>>>>>>>> even sure gluster would be able to keep up
instead of bringing the machines
>>>>>>>> and services down.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Here's the cluster config, though it
didn't seem to make any
>>>>>>>> difference performance-wise before I applied
the customizations vs after.
>>>>>>>>
>>>>>>>> Volume Name: apkmirror_data1
>>>>>>>> Type: Replicate
>>>>>>>> Volume ID: 11ecee7e-d4f8-497a-9994-ceb144d6841e
>>>>>>>> Status: Started
>>>>>>>> Snapshot Count: 0
>>>>>>>> Number of Bricks: 1 x 4 = 4
>>>>>>>> Transport-type: tcp
>>>>>>>> Bricks:
>>>>>>>> Brick1:
nexus2:/mnt/nexus2_block1/apkmirror_data1
>>>>>>>> Brick2: forge:/mnt/forge_block1/apkmirror_data1
>>>>>>>> Brick3: hive:/mnt/hive_block1/apkmirror_data1
>>>>>>>> Brick4:
citadel:/mnt/citadel_block1/apkmirror_data1
>>>>>>>> Options Reconfigured:
>>>>>>>> cluster.quorum-count: 1
>>>>>>>> cluster.quorum-type: fixed
>>>>>>>> network.ping-timeout: 5
>>>>>>>> network.remote-dio: enable
>>>>>>>> performance.rda-cache-limit: 256MB
>>>>>>>> performance.readdir-ahead: on
>>>>>>>> performance.parallel-readdir: on
>>>>>>>> network.inode-lru-limit: 500000
>>>>>>>> performance.md-cache-timeout: 600
>>>>>>>> performance.cache-invalidation: on
>>>>>>>> performance.stat-prefetch: on
>>>>>>>> features.cache-invalidation-timeout: 600
>>>>>>>> features.cache-invalidation: on
>>>>>>>> cluster.readdir-optimize: on
>>>>>>>> performance.io-thread-count: 32
>>>>>>>> server.event-threads: 4
>>>>>>>> client.event-threads: 4
>>>>>>>> performance.read-ahead: off
>>>>>>>> cluster.lookup-optimize: on
>>>>>>>> performance.cache-size: 1GB
>>>>>>>> cluster.self-heal-daemon: enable
>>>>>>>> transport.address-family: inet
>>>>>>>> nfs.disable: on
>>>>>>>> performance.client-io-threads: on
>>>>>>>>
>>>>>>>>
>>>>>>>> The mounts are done as follows in /etc/fstab:
>>>>>>>>
/dev/disk/by-id/scsi-0Linode_Volume_citadel_block1
>>>>>>>> /mnt/citadel_block1 xfs defaults 0 2
>>>>>>>> localhost:/apkmirror_data1 /mnt/apkmirror_data1
glusterfs
>>>>>>>> defaults,_netdev 0 0
>>>>>>>>
>>>>>>>> I'm really not sure if direct-io-mode mount
tweaks would do
>>>>>>>> anything here, what the value should be set to,
and what it is by default.
>>>>>>>>
>>>>>>>> The OS is OpenSUSE 42.3, 64-bit. 80GB of RAM,
20 CPUs, hosted by
>>>>>>>> Linode.
>>>>>>>>
>>>>>>>> I'd really appreciate any help in the
matter.
>>>>>>>>
>>>>>>>> Thank you.
>>>>>>>>
>>>>>>>>
>>>>>>>> Sincerely,
>>>>>>>> Artem
>>>>>>>>
>>>>>>>> --
>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>>>> <http://www.apkmirror.com/>, Illogical
Robot LLC
>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>
>>>>>>>> On Thu, Apr 5, 2018 at 11:13 PM, Artem
Russakovskii <
>>>>>>>> archon810 at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I'm trying to squeeze performance out
of gluster on 4 80GB RAM
>>>>>>>>> 20-CPU machines where Gluster runs on
attached block storage (Linode) in (4
>>>>>>>>> replicate bricks), and so far everything I
tried results in sub-optimal
>>>>>>>>> performance.
>>>>>>>>>
>>>>>>>>> There are many files - mostly images,
several million - and many
>>>>>>>>> operations take minutes, copying multiple
files (even if they're small)
>>>>>>>>> suddenly freezes up for seconds at a time,
then continues, iostat
>>>>>>>>> frequently shows large r_await and w_awaits
with 100% utilization for the
>>>>>>>>> attached block device, etc.
>>>>>>>>>
>>>>>>>>> But anyway, there are many guides out there
for small-file
>>>>>>>>> performance improvements, but more
explanation is needed, and I think more
>>>>>>>>> tweaks should be possible.
>>>>>>>>>
>>>>>>>>> My question today is about
performance.cache-size. Is this a size
>>>>>>>>> of cache in RAM? If so, how do I view the
current cache size to see if it
>>>>>>>>> gets full and I should increase its size?
Is it advisable to bump it up if
>>>>>>>>> I have many tens of gigs of RAM free?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> More generally, in the last 2 months since
I first started working
>>>>>>>>> with gluster and set a production system
live, I've been feeling frustrated
>>>>>>>>> because Gluster has a lot of
poorly-documented and confusing options. I
>>>>>>>>> really wish documentation could be improved
with examples and better
>>>>>>>>> explanations.
>>>>>>>>>
>>>>>>>>> Specifically, it'd be absolutely
amazing if the docs offered a
>>>>>>>>> strategy for setting each value and ways of
determining more optimal
>>>>>>>>> values. For example, for
performance.cache-size, if it said something like
>>>>>>>>> "run command abc to see your current
cache size, and if it's hurting, up
>>>>>>>>> it, but be aware that it's limited by
RAM," it'd be already a huge
>>>>>>>>> improvement to the docs. And so on with
other options.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The gluster team is quite helpful on this
mailing list, but in a
>>>>>>>>> reactive rather than proactive way. Perhaps
it's tunnel vision once you've
>>>>>>>>> worked on a project for so long where less
technical explanations and even
>>>>>>>>> proper documentation of options takes a
back seat, but I encourage you to
>>>>>>>>> be more proactive about helping us
understand and optimize Gluster.
>>>>>>>>>
>>>>>>>>> Thank you.
>>>>>>>>>
>>>>>>>>> Sincerely,
>>>>>>>>> Artem
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Founder, Android Police
<http://www.androidpolice.com>, APK Mirror
>>>>>>>>> <http://www.apkmirror.com/>,
Illogical Robot LLC
>>>>>>>>> beerpla.net | +ArtemRussakovskii
>>>>>>>>>
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>>>>>>>> <http://twitter.com/ArtemR>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users at gluster.org
>>>>>>>>
http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing listGluster-users at
gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180417/efdb3d54/attachment.html>

Ravishankar N

2018-Apr-18 06:49 UTC

head link

[Gluster-users] performance.cache-size for high-RAM clients/servers, other tweaks for performance, and improvements to Gluster docs

On 04/18/2018 11:59 AM, Artem Russakovskii wrote:> Btw, I've now noticed at least 5 variations in toggling binary option 
> values. Are they all interchangeable, or will using the wrong value 
> not work in some cases?
>
> yes/no
> true/false
> True/False
> on/off
> enable/disable
>
> It's quite a confusing/inconsistent practice, especially given that 
> many options will accept any value without erroring out/validation.
All these options are okay.>
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police <http://www.androidpolice.com>, APK Mirror 
> <http://www.apkmirror.com/>, Illogical Robot LLC
> beerpla.net <http://beerpla.net/> | +ArtemRussakovskii 
> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR 
> <http://twitter.com/ArtemR>
>
> On Tue, Apr 17, 2018 at 11:22 PM, Artem Russakovskii 
> <archon810 at gmail.com <mailto:archon810 at gmail.com>> wrote:
>
>     Thanks for the link. Looking at the status of that doc, it isn't
>     quite ready yet, and there's no mention of the option.
>
No, this is a completed feature available since 3.8 IIRC. You can use it 
safely. There is a difference in how to enable it though. Instead of 
using 'gluster volume set ...', you need to use 'gluster volume heal
<volname> granular-entry-heal enable' to turn it on. If there are no 
pending heals, it will run successfully. Otherwise you need to wait 
until heals are over (i.e. heal info shows zero entries). Just follow 
what the CLI says and you should be fine.

-Ravi>
>
>     Does it mean that whatever is ready now in 4.0.1 is incomplete but
>     can be enabled via granular-entry-heal=on, and when it is
>     complete, it'll become the default and the flag will simply go
away?
>
>     Is there any risk enabling the option now in 4.0.1?
>
>
>     Sincerely,
>     Artem
>
>     --
>     Founder, Android Police <http://www.androidpolice.com>, APK
Mirror
>     <http://www.apkmirror.com/>, Illogical Robot LLC
>     beerpla.net <http://beerpla.net/> | +ArtemRussakovskii
>     <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>     <http://twitter.com/ArtemR>
>
>     On Tue, Apr 17, 2018 at 11:16 PM, Ravishankar N
>     <ravishankar at redhat.com <mailto:ravishankar at
redhat.com>> wrote:
>
>
>
>         On 04/18/2018 10:35 AM, Artem Russakovskii wrote:
>>         Hi Ravi,
>>
>>         Could you please expand on how these would help?
>>
>>         By forcing full here, we move the logic from the CPU to
>>         network, thus decreasing CPU utilization, is that right?
>         Yes, 'diff' employs the rchecksum FOP which does a sha256?
>         checksum which can consume CPU. So yes it is sort of shifting
>         the load from CPU to the network. But if your average file
>         size is small, it would make sense to copy the entire file
>         instead of computing checksums.
>
>>         This is assuming the CPU and disk utilization are caused by
>>         the differ and not by lstat and other calls or something.
>>
>>             Option: cluster.data-self-heal-algorithm
>>             Default Value: (null)
>>             Description: Select between "full",
"diff". The "full"
>>             algorithm copies the entire file from source to sink. The
>>             "diff" algorithm copies to sink only those blocks
whose
>>             checksums don't match with those of source. If no
option
>>             is configured the option is chosen dynamically as
>>             follows: If the file does not exist on one of the sinks
>>             or empty file exists or if the source file size is about
>>             the same as page size the entire file will be read and
>>             written i.e "full" algo, otherwise
"diff" algo is chosen.
>>
>>
>>         I really have no idea what this means and how/why it would
>>         help. Any more info on this option?
>
>        
https://github.com/gluster/glusterfs-specs/blob/master/done/GlusterFS%203.8/granular-entry-self-healing.md
>        
<https://github.com/gluster/glusterfs-specs/blob/master/done/GlusterFS%203.8/granular-entry-self-healing.md>
>         should help.
>         Regards,
>         Ravi
>
>
>>             Option: cluster.granular-entry-heal
>>             Default Value: no
>>             Description: If this option is enabled, self-heal will
>>             resort to granular way of recording changelogs and doing
>>             entry self-heal.
>>
>>
>>         Thank you.
>>
>>
>>         Sincerely,
>>         Artem
>>
>>         --
>>         Founder, Android Police <http://www.androidpolice.com>,
APK
>>         Mirror <http://www.apkmirror.com/>, Illogical Robot LLC
>>         beerpla.net <http://beerpla.net/> | +ArtemRussakovskii
>>         <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>         <http://twitter.com/ArtemR>
>>
>>         On Tue, Apr 17, 2018 at 9:58 PM, Ravishankar N
>>         <ravishankar at redhat.com <mailto:ravishankar at
redhat.com>> wrote:
>>
>>
>>             On 04/18/2018 10:14 AM, Artem Russakovskii wrote:
>>>             Following up here on a related and very serious for us
>>>             issue.
>>>
>>>             I took down one of the 4 replicate gluster servers for
>>>             maintenance today. There are 2 gluster volumes totaling
>>>             about 600GB. Not that much data. After the server comes
>>>             back online, it starts auto healing and pretty much all
>>>             operations on gluster freeze for many minutes.
>>>
>>>             For example, I was trying to run an ls -alrt in a
folder
>>>             with 7300 files, and it took a good 15-20 minutes
before
>>>             returning.
>>>
>>>             During this time, I can see iostat show 100%
utilization
>>>             on the brick, heal status takes many minutes to return,
>>>             glusterfsd uses up tons of CPU (I saw it spike to
600%).
>>>             gluster already has massive performance issues for me,
>>>             but healing after a 4-hour downtime is on another level
>>>             of bad perf.
>>>
>>>             For example, this command took many minutes to run:
>>>
>>>             gluster volume heal androidpolice_data3 info summary
>>>             Brick nexus2:/mnt/nexus2_block4/androidpolice_data3
>>>             Status: Connected
>>>             Total Number of entries: 91
>>>             Number of entries in heal pending: 90
>>>             Number of entries in split-brain: 0
>>>             Number of entries possibly healing: 1
>>>
>>>             Brick forge:/mnt/forge_block4/androidpolice_data3
>>>             Status: Connected
>>>             Total Number of entries: 87
>>>             Number of entries in heal pending: 86
>>>             Number of entries in split-brain: 0
>>>             Number of entries possibly healing: 1
>>>
>>>             Brick hive:/mnt/hive_block4/androidpolice_data3
>>>             Status: Connected
>>>             Total Number of entries: 87
>>>             Number of entries in heal pending: 86
>>>             Number of entries in split-brain: 0
>>>             Number of entries possibly healing: 1
>>>
>>>             Brick citadel:/mnt/citadel_block4/androidpolice_data3
>>>             Status: Connected
>>>             Total Number of entries: 0
>>>             Number of entries in heal pending: 0
>>>             Number of entries in split-brain: 0
>>>             Number of entries possibly healing: 0
>>>
>>>
>>>             Statistics showed a diminishing number of failed heals:
>>>             ...
>>>             Ending time of crawl: Tue Apr 17 21:13:08 2018
>>>
>>>             Type of crawl: INDEX
>>>             No. of entries healed: 2
>>>             No. of entries in split-brain: 0
>>>             No. of heal failed entries: 102
>>>
>>>             Starting time of crawl: Tue Apr 17 21:13:09 2018
>>>
>>>             Ending time of crawl: Tue Apr 17 21:14:30 2018
>>>
>>>             Type of crawl: INDEX
>>>             No. of entries healed: 4
>>>             No. of entries in split-brain: 0
>>>             No. of heal failed entries: 91
>>>
>>>             Starting time of crawl: Tue Apr 17 21:14:31 2018
>>>
>>>             Ending time of crawl: Tue Apr 17 21:15:34 2018
>>>
>>>             Type of crawl: INDEX
>>>             No. of entries healed: 0
>>>             No. of entries in split-brain: 0
>>>             No. of heal failed entries: 88
>>>             ...
>>>
>>>             Eventually, everything heals and goes back to at least
>>>             where the roof isn't on fire anymore.
>>>
>>>             The server stats and volume options were given in one
of
>>>             the previous replies to this thread.
>>>
>>>             Any ideas or things I could run and show the output of
>>>             to help diagnose? I'm also very open to working
with
>>>             someone on the team on a live debugging session if
>>>             there's interest.
>>
>>             It is likely that self-heal is causing the CPU spike due
>>             to the flood of lookups/ locks and checksum fops that the
>>             self-heal-daemon sends to the bricks.
>>             There's a script to control shd's cpu usage using
>>             cgroups. That should help in regulating self-heal
>>             traffic: https://review.gluster.org/#/c/18404/
>>             <https://review.gluster.org/#/c/18404/> (see
>>             extras/control-cpu-load.sh)
>>             Other self-heal related volume options that you could
>>             change are setting
'cluster.data-self-heal-algorithm' to
>>             'full' and 'granular-entry-heal' to
'enable'.? `gluster
>>             volume set help` should give you more information about
>>             these options.
>>             Thanks,
>>             Ravi
>>
>>
>>>
>>>             Thank you.
>>>
>>>
>>>             Sincerely,
>>>             Artem
>>>
>>>             --
>>>             Founder, Android Police
<http://www.androidpolice.com>,
>>>             APK Mirror <http://www.apkmirror.com/>, Illogical
Robot LLC
>>>             beerpla.net <http://beerpla.net/> |
+ArtemRussakovskii
>>>             <https://plus.google.com/+ArtemRussakovskii> |
@ArtemR
>>>             <http://twitter.com/ArtemR>
>>>
>>>             On Tue, Apr 10, 2018 at 9:56 AM, Artem Russakovskii
>>>             <archon810 at gmail.com <mailto:archon810 at
gmail.com>> wrote:
>>>
>>>                 Hi Vlad,
>>>
>>>                 I actually saw that post already and even asked a
>>>                 question 4 days ago
>>>                
(https://serverfault.com/questions/517775/glusterfs-direct-i-o-mode#comment1172497_540917
>>>                
<https://serverfault.com/questions/517775/glusterfs-direct-i-o-mode#comment1172497_540917>).
>>>                 The accepted answer also seems to go against your
>>>                 suggestion to enable direct-io-mode as it says it
>>>                 should be disabled for better performance when used
>>>                 just for file accesses.
>>>
>>>                 It'd be great if someone from the Gluster team
>>>                 chimed in about this thread.
>>>
>>>
>>>                 Sincerely,
>>>                 Artem
>>>
>>>                 --
>>>                 Founder, Android Police
>>>                 <http://www.androidpolice.com>, APK Mirror
>>>                 <http://www.apkmirror.com/>, Illogical Robot
LLC
>>>                 beerpla.net <http://beerpla.net/> |
>>>                 +ArtemRussakovskii
>>>                 <https://plus.google.com/+ArtemRussakovskii>
|
>>>                 @ArtemR <http://twitter.com/ArtemR>
>>>
>>>                 On Tue, Apr 10, 2018 at 7:01 AM, Vlad Kopylov
>>>                 <vladkopy at gmail.com <mailto:vladkopy at
gmail.com>> wrote:
>>>
>>>                     Wish I knew or was able to get detailed
>>>                     description of those options myself.
>>>                     here is direct-io-mode
>>>                    
https://serverfault.com/questions/517775/glusterfs-direct-i-o-mode
>>>                    
<https://serverfault.com/questions/517775/glusterfs-direct-i-o-mode>
>>>                     Same as you I ran tests on a large volume of
>>>                     files, finding that main delays are in
attribute
>>>                     calls, ending up with those mount options to
add
>>>                     performance.
>>>                     I discovered those options through basically
>>>                     googling this user list with people sharing
>>>                     their tests.
>>>                     Not sure I would share your optimism, and
rather
>>>                     then going up I downgraded to 3.12 and have no
>>>                     dir view issue now. Though I had to recreate
the
>>>                     cluster and had to re-add bricks with existing
data.
>>>
>>>                     On Tue, Apr 10, 2018 at 1:47 AM, Artem
>>>                     Russakovskii <archon810 at gmail.com
>>>                     <mailto:archon810 at gmail.com>>
wrote:
>>>
>>>                         Hi Vlad,
>>>
>>>                         I'm using only localhost: mounts.
>>>
>>>                         Can you please explain what effect each
>>>                         option has on performance issues shown in
my
>>>                         posts?
>>>                        
"negative-timeout=10,attribute-timeout=30,fopen-keep-cache,direct-io-mode=enable,fetch-attempts=5"
>>>                         From what I remember, direct-io-mode=enable
>>>                         didn't make a difference in my tests,
but I
>>>                         suppose I can try again. The explanations
>>>                         about direct-io-mode are quite confusing on
>>>                         the web in various guides, saying enabling
>>>                         it could make performance worse in some
>>>                         situations and better in others due to OS
>>>                         file cache.
>>>
>>>                         There are also these gluster volume
>>>                         settings, adding to the confusion:
>>>                         Option: performance.strict-o-direct
>>>                         Default Value: off
>>>                         Description: This option when set to off,
>>>                         ignores the O_DIRECT flag.
>>>
>>>                         Option: performance.nfs.strict-o-direct
>>>                         Default Value: off
>>>                         Description: This option when set to off,
>>>                         ignores the O_DIRECT flag.
>>>
>>>                         Re: 4.0. I moved to 4.0 after finding out
>>>                         that it fixes the disappearing dirs bug
>>>                         related to cluster.readdir-optimize if you
>>>                         remember
>>>                        
(http://lists.gluster.org/pipermail/gluster-users/2018-April/033830.html
>>>                        
<http://lists.gluster.org/pipermail/gluster-users/2018-April/033830.html>).
>>>                         I was already on 3.13 by then, and 4.0
>>>                         resolved the issue. It's been stable
for me
>>>                         so far, thankfully.
>>>
>>>
>>>                         Sincerely,
>>>                         Artem
>>>
>>>                         --
>>>                         Founder, Android Police
>>>                         <http://www.androidpolice.com>, APK
Mirror
>>>                         <http://www.apkmirror.com/>,
Illogical Robot LLC
>>>                         beerpla.net <http://beerpla.net/> |
>>>                         +ArtemRussakovskii
>>>                        
<https://plus.google.com/+ArtemRussakovskii>
>>>                         | @ArtemR <http://twitter.com/ArtemR>
>>>
>>>                         On Mon, Apr 9, 2018 at 10:38 PM, Vlad
>>>                         Kopylov <vladkopy at gmail.com
>>>                         <mailto:vladkopy at gmail.com>>
wrote:
>>>
>>>                             you definitely need mount options to
>>>                             /etc/fstab
>>>                             use ones from here
>>>                            
http://lists.gluster.org/pipermail/gluster-users/2018-April/033811.html
>>>                            
<http://lists.gluster.org/pipermail/gluster-users/2018-April/033811.html>
>>>
>>>                             I went on with using local mounts to
>>>                             achieve performance as well
>>>
>>>                             Also, 3.12 or 3.10 branches would be
>>>                             preferable for production
>>>
>>>                             On Fri, Apr 6, 2018 at 4:12 AM, Artem
>>>                             Russakovskii <archon810 at gmail.com
>>>                             <mailto:archon810 at
gmail.com>> wrote:
>>>
>>>                                 Hi again,
>>>
>>>                                 I'd like to expand on the
>>>                                 performance issues and plead for
>>>                                 help. Here's one case which
shows
>>>                                 these odd hiccups:
>>>                                 https://i.imgur.com/CXBPjTK.gifv
>>>                                
<https://i.imgur.com/CXBPjTK.gifv>.
>>>
>>>                                 In this GIF where I switch back and
>>>                                 forth between copy operations on 2
>>>                                 servers, I'm copying a 10GB dir
full
>>>                                 of .apk and image files.
>>>
>>>                                 On server "hive" I'm
copying
>>>                                 straight from the main disk to an
>>>                                 attached volume block (xfs). As you
>>>                                 can see, the transfers are
>>>                                 relatively speedy and don't
hiccup.
>>>                                 On server "citadel"
I'm copying the
>>>                                 same set of data to a 4-replicate
>>>                                 gluster which uses block storage as
>>>                                 a brick. As you can see,
performance
>>>                                 is much worse, and there are
>>>                                 frequent pauses for many seconds
>>>                                 where nothing seems to be happening
>>>                                 - just freezes.
>>>
>>>                                 All 4 servers have the same specs,
>>>                                 and all of them have performance
>>>                                 issues with gluster and no such
>>>                                 issues when raw xfs block storage
is
>>>                                 used.
>>>
>>>                                 hive has long finished copying the
>>>                                 data, while citadel is barely
>>>                                 chugging along and is expected to
>>>                                 take probably half an hour to an
>>>                                 hour. I have over 1TB of data to
>>>                                 migrate, at which point if we went
>>>                                 live, I'm not even sure gluster
>>>                                 would be able to keep up instead of
>>>                                 bringing the machines and services
down.
>>>
>>>
>>>
>>>                                 Here's the cluster config,
though it
>>>                                 didn't seem to make any
difference
>>>                                 performance-wise before I applied
>>>                                 the customizations vs after.
>>>
>>>                                 Volume Name: apkmirror_data1
>>>                                 Type: Replicate
>>>                                 Volume ID:
>>>                                
11ecee7e-d4f8-497a-9994-ceb144d6841e
>>>                                 Status: Started
>>>                                 Snapshot Count: 0
>>>                                 Number of Bricks: 1 x 4 = 4
>>>                                 Transport-type: tcp
>>>                                 Bricks:
>>>                                 Brick1:
>>>                                
nexus2:/mnt/nexus2_block1/apkmirror_data1
>>>                                 Brick2:
>>>                                
forge:/mnt/forge_block1/apkmirror_data1
>>>                                 Brick3:
>>>                                
hive:/mnt/hive_block1/apkmirror_data1
>>>                                 Brick4:
>>>                                
citadel:/mnt/citadel_block1/apkmirror_data1
>>>                                 Options Reconfigured:
>>>                                 cluster.quorum-count: 1
>>>                                 cluster.quorum-type: fixed
>>>                                 network.ping-timeout: 5
>>>                                 network.remote-dio: enable
>>>                                 performance.rda-cache-limit: 256MB
>>>                                 performance.readdir-ahead: on
>>>                                 performance.parallel-readdir: on
>>>                                 network.inode-lru-limit: 500000
>>>                                 performance.md-cache-timeout: 600
>>>                                 performance.cache-invalidation: on
>>>                                 performance.stat-prefetch: on
>>>                                
features.cache-invalidation-timeout: 600
>>>                                 features.cache-invalidation: on
>>>                                 cluster.readdir-optimize: on
>>>                                 performance.io-thread-count: 32
>>>                                 server.event-threads: 4
>>>                                 client.event-threads: 4
>>>                                 performance.read-ahead: off
>>>                                 cluster.lookup-optimize: on
>>>                                 performance.cache-size: 1GB
>>>                                 cluster.self-heal-daemon: enable
>>>                                 transport.address-family: inet
>>>                                 nfs.disable: on
>>>                                 performance.client-io-threads: on
>>>
>>>
>>>                                 The mounts are done as follows in
>>>                                 /etc/fstab:
>>>                                
/dev/disk/by-id/scsi-0Linode_Volume_citadel_block1
>>>                                 /mnt/citadel_block1 xfs defaults 0
2
>>>                                 localhost:/apkmirror_data1
>>>                                 /mnt/apkmirror_data1 glusterfs
>>>                                 defaults,_netdev 0 0
>>>
>>>                                 I'm really not sure if
>>>                                 direct-io-mode mount tweaks would
do
>>>                                 anything here, what the value
should
>>>                                 be set to, and what it is by
default.
>>>
>>>                                 The OS is OpenSUSE 42.3, 64-bit.
>>>                                 80GB of RAM, 20 CPUs, hosted by
Linode.
>>>
>>>                                 I'd really appreciate any help
in
>>>                                 the matter.
>>>
>>>                                 Thank you.
>>>
>>>
>>>                                 Sincerely,
>>>                                 Artem
>>>
>>>                                 --
>>>                                 Founder, Android Police
>>>                                
<http://www.androidpolice.com>, APK
>>>                                 Mirror
<http://www.apkmirror.com/>,
>>>                                 Illogical Robot LLC
>>>                                 beerpla.net
<http://beerpla.net/> |
>>>                                 +ArtemRussakovskii
>>>                                
<https://plus.google.com/+ArtemRussakovskii>
>>>                                 | @ArtemR
<http://twitter.com/ArtemR>
>>>
>>>                                 On Thu, Apr 5, 2018 at 11:13 PM,
>>>                                 Artem Russakovskii
>>>                                 <archon810 at gmail.com
>>>                                 <mailto:archon810 at
gmail.com>> wrote:
>>>
>>>                                     Hi,
>>>
>>>                                     I'm trying to squeeze
>>>                                     performance out of gluster on 4
>>>                                     80GB RAM 20-CPU machines where
>>>                                     Gluster runs on attached block
>>>                                     storage (Linode) in (4
replicate
>>>                                     bricks), and so far everything
I
>>>                                     tried results in sub-optimal
>>>                                     performance.
>>>
>>>                                     There are many files - mostly
>>>                                     images, several million - and
>>>                                     many operations take minutes,
>>>                                     copying multiple files (even if
>>>                                     they're small) suddenly
freezes
>>>                                     up for seconds at a time, then
>>>                                     continues, iostat frequently
>>>                                     shows large r_await and
w_awaits
>>>                                     with 100% utilization for the
>>>                                     attached block device, etc.
>>>
>>>                                     But anyway, there are many
>>>                                     guides out there for small-file
>>>                                     performance improvements, but
>>>                                     more explanation is needed, and
>>>                                     I think more tweaks should be
>>>                                     possible.
>>>
>>>                                     My question today is
>>>                                     about?performance.cache-size.
Is
>>>                                     this a size of cache in RAM? If
>>>                                     so, how do I view the current
>>>                                     cache size to see if it gets
>>>                                     full and I should increase its
>>>                                     size? Is it advisable to bump
it
>>>                                     up if I have many tens of gigs
>>>                                     of RAM free?
>>>
>>>
>>>
>>>                                     More generally, in the last 2
>>>                                     months since I first started
>>>                                     working with gluster and set a
>>>                                     production system live,
I've
>>>                                     been feeling frustrated because
>>>                                     Gluster has a lot of
>>>                                     poorly-documented and confusing
>>>                                     options. I really wish
>>>                                     documentation could be improved
>>>                                     with examples and better
>>>                                     explanations.
>>>
>>>                                     Specifically, it'd be
absolutely
>>>                                     amazing if the docs offered a
>>>                                     strategy for setting each value
>>>                                     and ways of determining more
>>>                                     optimal values. For example,
>>>                                     for?performance.cache-size, if
>>>                                     it said something like
"run
>>>                                     command abc to see your current
>>>                                     cache size, and if it's
hurting,
>>>                                     up it, but be aware that
it's
>>>                                     limited by RAM," it'd
be already
>>>                                     a huge improvement to the docs.
>>>                                     And so on with other options.
>>>
>>>
>>>
>>>                                     The gluster team is quite
>>>                                     helpful on this mailing list,
>>>                                     but in a reactive rather than
>>>                                     proactive way. Perhaps it's
>>>                                     tunnel vision once you've
worked
>>>                                     on a project for so long where
>>>                                     less technical explanations and
>>>                                     even proper documentation of
>>>                                     options takes a back seat, but
I
>>>                                     encourage you to be more
>>>                                     proactive about helping us
>>>                                     understand and optimize
Gluster.
>>>
>>>                                     Thank you.
>>>
>>>                                     Sincerely,
>>>                                     Artem
>>>
>>>                                     --
>>>                                     Founder, Android Police
>>>                                    
<http://www.androidpolice.com>,
>>>                                     APK Mirror
>>>                                    
<http://www.apkmirror.com/>,
>>>                                     Illogical Robot LLC
>>>                                     beerpla.net
>>>                                     <http://beerpla.net/> |
>>>                                     +ArtemRussakovskii
>>>                                    
<https://plus.google.com/+ArtemRussakovskii>
>>>                                     | @ArtemR
>>>                                    
<http://twitter.com/ArtemR>
>>>
>>>
>>>
>>>                                
_______________________________________________
>>>                                 Gluster-users mailing list
>>>                                 Gluster-users at gluster.org
>>>                                 <mailto:Gluster-users at
gluster.org>
>>>                                
http://lists.gluster.org/mailman/listinfo/gluster-users
>>>                                
<http://lists.gluster.org/mailman/listinfo/gluster-users>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>             _______________________________________________
>>>             Gluster-users mailing list
>>>             Gluster-users at gluster.org <mailto:Gluster-users
at gluster.org>
>>>             http://lists.gluster.org/mailman/listinfo/gluster-users
>>>            
<http://lists.gluster.org/mailman/listinfo/gluster-users>
>>
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180418/050ebe26/attachment.html>

Reasonably Related Threads

Search for more reasonably related threads

Gluster users - Apr 2018 - performance.cache-size for high-RAM clients/servers, other tweaks for performance, and improvements to Gluster docs

[Gluster-users] performance.cache-size for high-RAM clients/servers, other tweaks for performance, and improvements to Gluster docs

[Gluster-users] performance.cache-size for high-RAM clients/servers, other tweaks for performance, and improvements to Gluster docs

[Gluster-users] performance.cache-size for high-RAM clients/servers, other tweaks for performance, and improvements to Gluster docs

Reasonably Related Threads