thr3ads.net - Gluster users - [Gluster-users] Very slow rsync to gluster volume UNLESS `ls` or `find` scan dir on gluster volume first [Apr 2018]

If this information is useful, please help other people find it:
Share via:

Artem Russakovskii

2018-Feb-27 13:52 UTC

[Gluster-users] Very slow rsync to gluster volume UNLESS `ls` or `find` scan dir on gluster volume first

Any updates on this one?

On Mon, Feb 5, 2018 at 8:18 AM, Tom Fite <tomfite at gmail.com> wrote:
> Hi all,
>
> I have seen this issue as well, on Gluster 3.12.1. (3 bricks per box, 2
> boxes, distributed-replicate) My testing shows the same thing -- running a
> find on a directory dramatically increases lstat performance. To add
> another clue, the performance degrades again after issuing a call to reset
> the system's cache of dentries and inodes:
>
> # sync; echo 2 > /proc/sys/vm/drop_caches
>
> I think that this shows that it's the system cache that's actually
doing
> the heavy lifting here. There are a couple of sysctl tunables that I've
> found helps out with this.
>
> See here:
>
> http://docs.gluster.org/en/latest/Administrator%20Guide/
> Linux%20Kernel%20Tuning/
>
> Contrary to what that doc says, I've found that setting
> vm.vfs_cache_pressure to a low value increases performance by allowing more
> dentries and inodes to be retained in the cache.
>
> # Set the swappiness to avoid swap when possible.
> vm.swappiness = 10
>
> # Set the cache pressure to prefer inode and dentry cache over file cache.
> This is done to keep as many
> # dentries and inodes in cache as possible, which dramatically improves
> gluster small file performance.
> vm.vfs_cache_pressure = 25
>
> For comparison, my config is:
>
> Volume Name: gv0
> Type: Tier
> Volume ID: d490a9ec-f9c8-4f10-a7f3-e1b6d3ced196
> Status: Started
> Snapshot Count: 13
> Number of Bricks: 8
> Transport-type: tcp
> Hot Tier :
> Hot Tier Type : Replicate
> Number of Bricks: 1 x 2 = 2
> Brick1: gluster2:/data/hot_tier/gv0
> Brick2: gluster1:/data/hot_tier/gv0
> Cold Tier:
> Cold Tier Type : Distributed-Replicate
> Number of Bricks: 3 x 2 = 6
> Brick3: gluster1:/data/brick1/gv0
> Brick4: gluster2:/data/brick1/gv0
> Brick5: gluster1:/data/brick2/gv0
> Brick6: gluster2:/data/brick2/gv0
> Brick7: gluster1:/data/brick3/gv0
> Brick8: gluster2:/data/brick3/gv0
> Options Reconfigured:
> performance.cache-max-file-size: 128MB
> cluster.readdir-optimize: on
> cluster.watermark-hi: 95
> features.ctr-sql-db-cachesize: 262144
> cluster.read-freq-threshold: 5
> cluster.write-freq-threshold: 2
> features.record-counters: on
> cluster.tier-promote-frequency: 15000
> cluster.tier-pause: off
> cluster.tier-compact: on
> cluster.tier-mode: cache
> features.ctr-enabled: on
> performance.cache-refresh-timeout: 60
> performance.stat-prefetch: on
> server.outstanding-rpc-limit: 2056
> cluster.lookup-optimize: on
> performance.client-io-threads: off
> nfs.disable: on
> transport.address-family: inet
> features.barrier: disable
> client.event-threads: 4
> server.event-threads: 4
> performance.cache-size: 1GB
> network.inode-lru-limit: 90000
> performance.md-cache-timeout: 600
> performance.cache-invalidation: on
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: on
> performance.quick-read: on
> performance.io-cache: on
> performance.nfs.write-behind-window-size: 4MB
> performance.write-behind-window-size: 4MB
> performance.nfs.io-threads: off
> network.tcp-window-size: 1048576
> performance.rda-cache-limit: 64MB
> performance.flush-behind: on
> server.allow-insecure: on
> cluster.tier-demote-frequency: 18000
> cluster.tier-max-files: 1000000
> cluster.tier-max-promote-file-size: 10485760
> cluster.tier-max-mb: 64000
> features.ctr-sql-db-wal-autocheckpoint: 2500
> cluster.tier-hot-compact-frequency: 86400
> cluster.tier-cold-compact-frequency: 86400
> performance.readdir-ahead: off
> cluster.watermark-low: 50
> storage.build-pgfid: on
> performance.rda-request-size: 128KB
> performance.rda-low-wmark: 4KB
> cluster.min-free-disk: 5%
> auto-delete: enable
>
>
> On Sun, Feb 4, 2018 at 9:44 PM, Amar Tumballi <atumball at
redhat.com> wrote:
>
>> Thanks for the report Artem,
>>
>> Looks like the issue is about cache warming up. Specially, I suspect
>> rsync doing a 'readdir(), stat(), file operations' loop, where
as when a
>> find or ls is issued, we get 'readdirp()' request, which
contains the stat
>> information along with entries, which also makes sure cache is
up-to-date
>> (at md-cache layer).
>>
>> Note that this is just a off-the memory hypothesis, We surely need to
>> analyse and debug more thoroughly for a proper explanation.  Some one
in my
>> team would look at it soon.
>>
>> Regards,
>> Amar
>>
>> On Mon, Feb 5, 2018 at 7:25 AM, Vlad Kopylov <vladkopy at
gmail.com> wrote:
>>
>>> You mounting it to the local bricks?
>>>
>>> struggling with same performance issues
>>> try using this volume setting
>>> http://lists.gluster.org/pipermail/gluster-users/2018-Januar
>>> y/033397.html
>>> performance.stat-prefetch: on might be it
>>>
>>> seems like when it gets to cache it is fast - those stat fetch
which
>>> seem to come from .gluster are slow
>>>
>>> On Sun, Feb 4, 2018 at 3:45 AM, Artem Russakovskii <archon810 at
gmail.com>
>>> wrote:
>>> > An update, and a very interesting one!
>>> >
>>> > After I started stracing rsync, all I could see was lstat
calls, quite
>>> slow
>>> > ones, over and over, which is expected.
>>> >
>>> > For example: lstat("uploads/2016/10/nexus2c
>>> ee_DSC05339_thumb-161x107.jpg",
>>> > {st_mode=S_IFREG|0664, st_size=4043, ...}) = 0
>>> >
>>> > I googled around and found
>>> > https://gist.github.com/nh2/1836415489e2132cf85ed3832105fcc1,
which is
>>> > seeing this exact issue with gluster, rsync and xfs.
>>> >
>>> > Here's the craziest finding so far. If while rsync is
running (or right
>>> > before), I run /bin/ls or find on the same gluster dirs, it
immediately
>>> > speeds up rsync by a factor of 100 or maybe even 1000.
It's absolutely
>>> > insane.
>>> >
>>> > I'm stracing the rsync run, and the slow lstat calls flood
in at an
>>> > incredible speed as soon as ls or find run. Several hundred of
files
>>> per
>>> > minute (excruciatingly slow) becomes thousands or even tens of
>>> thousands of
>>> > files a second.
>>> >
>>> > What do you make of this?
>>> >
>>> >
>>> > _______________________________________________
>>> > Gluster-users mailing list
>>> > Gluster-users at gluster.org
>>> > http://lists.gluster.org/mailman/listinfo/gluster-users
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>>
>> --
>> Amar Tumballi (amarts)
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180227/88045149/attachment.html>

Ingard Mevåg

2018-Feb-27 14:15 UTC

head link

[Gluster-users] Very slow rsync to gluster volume UNLESS `ls` or `find` scan dir on gluster volume first

We got extremely slow stat calls on our disperse cluster running latest
3.12 with clients also running 3.12.
When we downgraded clients to 3.10 the slow stat problem went away.

We later found out that by disabling disperse.eager-lock we could run the
3.12 clients without much issue (a little bit slower writes)
There is an open issue on this here :
https://bugzilla.redhat.com/show_bug.cgi?id=1546732
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180227/2167fd7e/attachment.html>

Artem Russakovskii

2018-Apr-18 04:31 UTC

head link

[Gluster-users] Very slow rsync to gluster volume UNLESS `ls` or `find` scan dir on gluster volume first

Nithya, Amar,

Any movement here? There could be a significant performance gain here that
may also affect other bottlenecks that I'm experiencing which make gluster
close to unusable at times.


Sincerely,
Artem

--
Founder, Android Police <http://www.androidpolice.com>, APK Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
<http://twitter.com/ArtemR>

On Tue, Feb 27, 2018 at 6:15 AM, Ingard Mev?g <ingard at jotta.no> wrote:
> We got extremely slow stat calls on our disperse cluster running latest
> 3.12 with clients also running 3.12.
> When we downgraded clients to 3.10 the slow stat problem went away.
>
> We later found out that by disabling disperse.eager-lock we could run the
> 3.12 clients without much issue (a little bit slower writes)
> There is an open issue on this here : https://bugzilla.redhat.com/
> show_bug.cgi?id=1546732
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180417/9ab3d736/attachment.html>

Apparently Analagous Threads

Search for more reasonably related threads

Gluster users - Apr 2018 - Very slow rsync to gluster volume UNLESS `ls` or `find` scan dir on gluster volume first

[Gluster-users] Very slow rsync to gluster volume UNLESS `ls` or `find` scan dir on gluster volume first

[Gluster-users] Very slow rsync to gluster volume UNLESS `ls` or `find` scan dir on gluster volume first

[Gluster-users] Very slow rsync to gluster volume UNLESS `ls` or `find` scan dir on gluster volume first

Apparently Analagous Threads