thr3ads.net - Gluster users - [Gluster-users] Global threading [Mar 2021]

If this information is useful, please help other people find it:
Share via:

Zenon Panoussis

2021-Mar-05 15:47 UTC

[Gluster-users] Global threading

Some time ago I created a replica 3 volume using gluster 8.3
with the following topology for the time being:

server1/brick1 ----\                          /---- server3/brick3
                    \____ ADSL 10/1 Mbits ___/
                    /     <- down   up ->    \
server2/brick2 ----/                          \---- old storage


The connection between the two boxes at each end is 1Gbit.
The distance between the two sides is about 4000 km and
roughly 250ms.

For the past one and a half month I have been running one rsync
on each of the three servers to fetch different parts of a
mail store from "old storage". The mail store consists of
about 1.1 million predominantly small files very unevenly
spread over 6600 directories. Some directories contain 30000+
files, the worst one has 90000+.

Copying simultaneously to all three servers wastes traffic
(what is rsynced to server1 and server2 has to travel down
from old storage and then back up again to server3), but
uses the available bandwidth more efficiently (by using
both directions instead of only down, as the case would be
if I only rsynced to server3 and let the replication flow
down to servers 1 and 2). I did this because, as I mentioned
earlier in the thread "Replication logic", I cannot saturate
any of CPU, disk I/O or even the meager network. This way
the waste of traffic increases the overall speed of copying.
Diagnostics showed that FSYNC had by far the greatest average
latency, followed by MKDIR and CREATE, but they all had
relatively few calls. LOOKUP is what has a huge number of
calls so, even with a moderate average latency, it accounts
for the greatest overall delay, followed by INODELK.

I tested writing both to glusterfs and nfs-ganesha, but
didn't notice any difference between them in speed (however,
nfs-ganesha used seven times more memory than glusterfsd).
Tweaking threads, write-behind, parallel-readdir, cache-size
and inode-lru-limit didn't produce any noticeable difference
either.

Then a few days ago I noticed global-threading at
https://github.com/gluster/glusterfs/issues/532 . It
seemed promising but not merged, but it turned out that
it is actually merged. So last night I upgraded to 9.0
and turned it on. I also dumped nfs-ganesha. With that,
my configuration ended up like this:

Volume Name: gv0
Type: Replicate
Volume ID: 2786efab-9178-4a9a-a525-21d6f1c94de9
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: node1:/gfs/gv0
Brick2: node2:/gfs/gv0
Brick3: node3:/gfs/gv0
Options Reconfigured:
cluster.granular-entry-heal: enable
network.ping-timeout: 20
network.frame-timeout: 60
performance.write-behind: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
features.bitrot: off
features.scrub: Inactive
features.scrub-freq: weekly
performance.io-thread-count: 32
features.selinux: off
client.event-threads: 3
server.event-threads: 3
cluster.min-free-disk: 1%
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.cache-invalidation: on
cluster.self-heal-daemon: enable
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
performance.cache-size: 256MB
network.inode-lru-limit: 131072
performance.parallel-readdir: on
performance.qr-cache-timeout: 600
performance.nl-cache-positive-entry: on
performance.nfs.io-threads: on
config.global-threading: on
performance.iot-pass-through: on

In the short time it's been running since, I saw no
subjectively noticeable increase in the speed of
writing, but I do see some increase in the speed of
file listing (that is, the speed at which rsync
without --whole-file will run through preexisting
files while reporting "file X is uptodate"). This
is presumably stat working faster because of thread
parallelisation, but I'm only guessing. The network
still does not get saturated except during the
transfer of some occasional big (5MB+) files. So
far I have seen no negative impact of turning global
threading on compared to previously.

Any and all ideas on how to improve this setup (other
than physically) are most welcome.

Xavi Hernandez

2021-Mar-11 17:31 UTC

head link

[Gluster-users] Global threading

Hi Zenon,

On Fri, Mar 5, 2021 at 4:52 PM Zenon Panoussis <oracle at provocation.net>
wrote:
>
> Some time ago I created a replica 3 volume using gluster 8.3
> with the following topology for the time being:
>
> server1/brick1 ----\                          /---- server3/brick3
>                     \____ ADSL 10/1 Mbits ___/
>                     /     <- down   up ->    \
> server2/brick2 ----/                          \---- old storage
>
>
> The connection between the two boxes at each end is 1Gbit.
> The distance between the two sides is about 4000 km and
> roughly 250ms.
>
> For the past one and a half month I have been running one rsync
> on each of the three servers to fetch different parts of a
> mail store from "old storage". The mail store consists of
> about 1.1 million predominantly small files very unevenly
> spread over 6600 directories. Some directories contain 30000+
> files, the worst one has 90000+.
>
> Copying simultaneously to all three servers wastes traffic
> (what is rsynced to server1 and server2 has to travel down
> from old storage and then back up again to server3), but
> uses the available bandwidth more efficiently (by using
> both directions instead of only down, as the case would be
> if I only rsynced to server3 and let the replication flow
> down to servers 1 and 2). I did this because, as I mentioned
> earlier in the thread "Replication logic", I cannot saturate
> any of CPU, disk I/O or even the meager network. This way
> the waste of traffic increases the overall speed of copying.
> Diagnostics showed that FSYNC had by far the greatest average
> latency, followed by MKDIR and CREATE, but they all had
> relatively few calls. LOOKUP is what has a huge number of
> calls so, even with a moderate average latency, it accounts
> for the greatest overall delay, followed by INODELK.
>
> I tested writing both to glusterfs and nfs-ganesha, but
> didn't notice any difference between them in speed (however,
> nfs-ganesha used seven times more memory than glusterfsd).
> Tweaking threads, write-behind, parallel-readdir, cache-size
> and inode-lru-limit didn't produce any noticeable difference
> either.
>
> Then a few days ago I noticed global-threading at
> https://github.com/gluster/glusterfs/issues/532 . It
> seemed promising but not merged, but it turned out that
> it is actually merged. So last night I upgraded to 9.0
> and turned it on. I also dumped nfs-ganesha. With that,
> my configuration ended up like this:
>
> Volume Name: gv0
> Type: Replicate
> Volume ID: 2786efab-9178-4a9a-a525-21d6f1c94de9
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: node1:/gfs/gv0
> Brick2: node2:/gfs/gv0
> Brick3: node3:/gfs/gv0
> Options Reconfigured:
> cluster.granular-entry-heal: enable
> network.ping-timeout: 20
> network.frame-timeout: 60
> performance.write-behind: on
> storage.fips-mode-rchecksum: on
> transport.address-family: inet
> nfs.disable: on
> performance.client-io-threads: off
> features.bitrot: off
> features.scrub: Inactive
> features.scrub-freq: weekly
> performance.io-thread-count: 32
> features.selinux: off
> client.event-threads: 3
> server.event-threads: 3
> cluster.min-free-disk: 1%
> features.cache-invalidation: on
> features.cache-invalidation-timeout: 600
> performance.cache-invalidation: on
> cluster.self-heal-daemon: enable
> diagnostics.latency-measurement: on
> diagnostics.count-fop-hits: on
> performance.cache-size: 256MB
> network.inode-lru-limit: 131072
> performance.parallel-readdir: on
> performance.qr-cache-timeout: 600
> performance.nl-cache-positive-entry: on
> performance.nfs.io-threads: on
> config.global-threading: on
> performance.iot-pass-through: on
>
> In the short time it's been running since, I saw no
> subjectively noticeable increase in the speed of
> writing, but I do see some increase in the speed of
> file listing (that is, the speed at which rsync
> without --whole-file will run through preexisting
> files while reporting "file X is uptodate"). This
> is presumably stat working faster because of thread
> parallelisation, but I'm only guessing. The network
> still does not get saturated except during the
> transfer of some occasional big (5MB+) files. So
> far I have seen no negative impact of turning global
> threading on compared to previously.
>
> Any and all ideas on how to improve this setup (other
> than physically) are most welcome.
>
The main issue with the global threading is that it's not regularly tested,
so it could have unknown bugs. Besides that are you using it both on client
and bricks, or only on the client ?

I think the main problem with rsync is that it's mostly a sequential
program that does many small requests. In this case it's hard to saturate
the network because the roundtrip latency of sequential operations is what
dominates.

To improve that you could try to run several rsync processes in parallel.
That should make better use of the bandwidth. Gluster normally works better
with parallel operations. It's not so good with single sequential
operations.

Another thing you could try is to increase the timeout of kernel cache
using "entry-timeout" and "attribute-timeout" mount options.
By default
they are set to 1. A higher value could help reduce the number of lookups.
However this could cause some delays detecting changes or even create
inconsistencies for worst cases. This should only be used when there's a
single fuse mount using the volume. As the global threading feature, using
higher values here has not been tested, so it could have other unexpected
problems.

Regards,

Xavi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20210311/6c9038f0/attachment.html>

Gluster users - Mar 2021 - Global threading

[Gluster-users] Global threading

[Gluster-users] Global threading