thr3ads.net - Gluster users - [Gluster-users] Gluster very poor performance when copying small files (1x (2+1) = 3, SSD) [Mar 2018]

If this information is useful, please help other people find it:
Share via:

Ondrej Valousek

2018-Mar-19 09:42 UTC

[Gluster-users] Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)

Hi,
As I posted in my previous emails - glusterfs can never match NFS (especially
async one) performance of small files/latency. That's given by the design.
Nothing you can do about it.
Ondrej

-----Original Message-----
From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at
gluster.org] On Behalf Of Rik Theys
Sent: Monday, March 19, 2018 10:38 AM
To: gluster-users at gluster.org; mailinglists at smcleod.net
Subject: Re: [Gluster-users] Gluster very poor performance when copying small
files (1x (2+1) = 3, SSD)

Hi,

I've done some similar tests and experience similar performance issues (see
my 'gluster for home directories?' thread on the list).

If I read your mail correctly, you are comparing an NFS mount of the brick disk
against a gluster mount (using the fuse client)?

Which options do you have set on the NFS export (sync or async)?
>From my tests, I concluded that the issue was not bandwidth but latency.Gluster will only return an IO operation once all bricks have confirmed that the
data is on disk. If you are using a fuse mount, you might compare with using the
'direct-io-mode=disable' option on the client might help (no experience
with this).

In our tests, I've used NFS-ganesha to serve the gluster volume over NFS.
This makes things even worse as NFS-ganesha has no "async" mode, which
makes performance terrible.

If you find a magic knob to make glusterfs fast on small-file workloads, do let
me know!

Regards,

Rik

On 03/18/2018 11:13 PM, Sam McLeod wrote:> Howdy all,
> 
> We're experiencing terrible small file performance when copying or 
> moving files on gluster clients.
> 
> In the example below, Gluster is taking 6mins~ to copy 128MB / 21,000 
> files sideways on a client, doing the same thing on NFS (which I know 
> is a totally different solution etc. etc.) takes approximately 10-15 
> seconds(!).
> 
> Any advice for tuning the volume or XFS settings would be greatly 
> appreciated.
> 
> Hopefully I've included enough relevant information below.
> 
> 
> ## Gluster Client
> 
> root at gluster-client:/mnt/gluster_perf_test/ ?# du -sh .
> 127M ? ?.
> root at gluster-client:/mnt/gluster_perf_test/ ?# find . -type f | wc -l
> 21791
> root at gluster-client:/mnt/gluster_perf_test/ ?# du 9584toto9584.txt
> 4 ? ?9584toto9584.txt
> 
> 
> root at gluster-client:/mnt/gluster_perf_test/ ?# time cp -a private 
> private_perf_test
> 
> real ? ?5m51.862s
> user ? ?0m0.862s
> sys ? ?0m8.334s
> 
> root at gluster-client:/mnt/gluster_perf_test/ # time rm -rf 
> private_perf_test/
> 
> real ? ?0m49.702s
> user ? ?0m0.087s
> sys ? ?0m0.958s
> 
> 
> ## Hosts
> 
> - 16x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz per Gluster host / 
> client
> - Storage: iSCSI provisioned (via 10Gbit DAC/Fibre), SSD disk, 50K 
> R/RW 4k IOP/s, 400MB/s per Gluster host
> - Volumes are replicated across two hosts and one arbiter only host
> - Networking is 10Gbit DAC/Fibre between Gluster hosts and clients
> - 18GB DDR4 ECC memory
> 
> ## Volume Info
> 
> root at gluster-host-01:~ # gluster pool list UUID ? ? ? ? ?Hostname ? ? ?
? ? ? ? ? ? ? ? ?
> State ad02970b-e2aa-4ca8-998c-bd10d5970faa ?gluster-host-02.fqdn 
> Connected ea116a94-c19e-48db-b108-0be3ae622e2e ?gluster-host-03.fqdn 
> Connected
> 2e855c25-e7ac-4ff6-be85-e8bcc6f45ee4 ?localhost Connected
> 
> root at gluster-host-01:~ # gluster volume info uat_storage
> 
> Volume Name: uat_storage
> Type: Replicate
> Volume ID: 7918f1c5-5031-47b8-b054-56f6f0c569a2
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: gluster-host-01.fqdn:/mnt/gluster-storage/uat_storage
> Brick2: gluster-host-02.fqdn:/mnt/gluster-storage/uat_storage
> Brick3: gluster-host-03.fqdn:/mnt/gluster-storage/uat_storage 
> (arbiter) Options Reconfigured:
> performance.rda-cache-limit: 256MB
> network.inode-lru-limit: 50000
> server.outstanding-rpc-limit: 256
> performance.client-io-threads: true
> nfs.disable: on
> transport.address-family: inet
> client.event-threads: 8
> cluster.eager-lock: true
> cluster.favorite-child-policy: size
> cluster.lookup-optimize: true
> cluster.readdir-optimize: true
> cluster.use-compound-fops: true
> diagnostics.brick-log-level: ERROR
> diagnostics.client-log-level: ERROR
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: true
> network.ping-timeout: 15
> performance.cache-invalidation: true
> performance.cache-max-file-size: 6MB
> performance.cache-refresh-timeout: 60
> performance.cache-size: 1024MB
> performance.io <http://performance.io>-thread-count: 16
> performance.md-cache-timeout: 600
> performance.stat-prefetch: true
> performance.write-behind-window-size: 256MB
> server.event-threads: 8
> transport.listen-backlog: 2048
> 
> root at gluster-host-01:~ # xfs_info /dev/mapper/gluster-storage-unlocked
> meta-data=/dev/mapper/gluster-storage-unlocked isize=512 ? ?agcount=4,
> agsize=196607360 blks
> ? ? ? ? ?= ? ? ? ? ? ? ? ? ? ? ? sectsz=512 ? attr=2, projid32bit=1
> ? ? ? ? ?= ? ? ? ? ? ? ? ? ? ? ? crc=1 ? ? ? ?finobt=0 spinodes=0 data ? ? 
> = ? ? ? ? ? ? ? ? ? ? ? bsize=4096 ? blocks=786429440, imaxpct=5
> ? ? ? ? ?= ? ? ? ? ? ? ? ? ? ? ? sunit=0 ? ? ?swidth=0 blks naming ? 
> =version 2 ? ? ? ? ? ? ?bsize=8192 ? ascii-ci=0 ftype=1 log ? ? ?
> =internal ? ? ? ? ? ? ? bsize=4096 ? blocks=383998, version=2
> ? ? ? ? ?= ? ? ? ? ? ? ? ? ? ? ? sectsz=512 ? sunit=0 blks, 
> lazy-count=1 realtime =none ? ? ? ? ? ? ? ? ? extsz=4096 ? blocks=0, 
> rtextents=0
> 
> 
> --
> Sam McLeod (protoporpoise on IRC)
> https://smcleod.net
> https://twitter.com/s_mcleod
> 
> Words are my own opinions and do not?necessarily represent those of my?
> employer or partners.
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
> 

--
Rik Theys
System Engineer
KU Leuven - Dept. Elektrotechniek (ESAT) Kasteelpark Arenberg 10 bus 2440  -
B-3001 Leuven-Heverlee
+32(0)16/32.11.07
----------------------------------------------------------------
<<Any errors in spelling, tact or fact are transmission errors>>
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users
-----

The information contained in this e-mail and in any attachments is confidential
and is designated solely for the attention of the intended recipient(s). If you
are not an intended recipient, you must not use, disclose, copy, distribute or
retain this e-mail or any part thereof. If you have received this e-mail in
error, please notify the sender by return e-mail and delete all copies of this
e-mail from your computer system(s). Please direct any additional queries to:
communications at s3group.com. Thank You. Silicon and Software Systems Limited
(S3 Group). Registered in Ireland no. 378073. Registered Office: South County
Business Park, Leopardstown, Dublin 18.

TomK

2018-Mar-19 14:42 UTC

head link

[Gluster-users] Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)

On 3/19/2018 5:42 AM, Ondrej Valousek wrote:
Removing NFS or NFS Ganesha from the equation, not very impressed on my 
own setup either.  For the writes it's doing, that's alot of CPU usage 
in top. Seems bottle-necked via a single execution core somewhere trying 
to facilitate read / writes to the other bricks.

Writes to the gluster FS from within one of the gluster participating 
bricks:

[root at nfs01 n]# dd if=/dev/zero of=./some-file.bin

393505+0 records in
393505+0 records out
201474560 bytes (201 MB) copied, 50.034 s, 4.0 MB/s

[root at nfs01 n]#

Top results (10 second average)won't go over 32%:

top - 00:49:38 up 21:39,  2 users,  load average: 0.42, 0.24, 0.19
Tasks: 164 total,   1 running, 163 sleeping,   0 stopped,   0 zombie
%Cpu0  : 29.3 us, 24.7 sy,  0.0 ni, 45.1 id,  0.0 wa,  0.0 hi,  0.8 si, 
0.0 st
%Cpu1  : 27.2 us, 24.1 sy,  0.0 ni, 47.2 id,  0.0 wa,  0.0 hi,  1.5 si, 
0.0 st
%Cpu2  : 20.2 us, 13.5 sy,  0.0 ni, 64.1 id,  0.0 wa,  0.0 hi,  2.3 si, 
0.0 st
%Cpu3  : 30.0 us, 16.2 sy,  0.0 ni, 47.5 id,  0.0 wa,  0.0 hi,  6.3 si, 
0.0 st
KiB Mem :  3881708 total,  3207488 free,   346680 used,   327540 buff/cache
KiB Swap:  4063228 total,  4062828 free,      400 used.  3232208 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
   1319 root      20   0  819036  12928   4036 S 32.3  0.3   1:19.64 
glusterfs
   1310 root      20   0 1232428  25636   4364 S 12.1  0.7   0:41.25 
glusterfsd


Next, the same write but directly to the brick via XFS, which of course 
is faster:


top - 09:45:09 up 1 day,  6:34,  3 users,  load average: 0.61, 1.01, 1.04
Tasks: 171 total,   2 running, 169 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0.6 us,  2.1 sy,  0.0 ni, 82.6 id, 14.5 wa,  0.0 hi,  0.2 si, 
0.0 st
%Cpu1  : 16.7 us, 83.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si, 
0.0 st
%Cpu2  :  0.4 us,  0.9 sy,  0.0 ni, 94.2 id,  4.4 wa,  0.0 hi,  0.0 si, 
0.0 st
%Cpu3  :  1.1 us,  0.6 sy,  0.0 ni, 98.3 id,  0.0 wa,  0.0 hi,  0.0 si, 
0.0 st
KiB Mem :  3881708 total,   501120 free,   230704 used,  3149884 buff/cache
KiB Swap:  4063228 total,  3876896 free,   186332 used.  3343960 avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
14691 root      20   0  107948    608    512 R 25.0  0.0   0:34.29 dd
  1334 root      20   0 2694264  61076   2228 S  2.7  1.6 283:55.96 
ganesha.nfsd


The result of a dd command directly against the brick FS itself is of 
course much better:


[root at nfs01 gv01]# dd if=/dev/zero of=./some-file.bin
5771692+0 records in
5771692+0 records out
2955106304 bytes (3.0 GB) copied, 35.3425 s, 83.6 MB/s

[root at nfs01 gv01]# pwd
/bricks/0/gv01
[root at nfs01 gv01]#

Tried a few tweak options with no effect:

[root at nfs01 glusterfs]# gluster volume info

Volume Name: gv01
Type: Replicate
Volume ID: e5ccc75e-5192-45ac-b410-a34ebd777666
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: nfs01:/bricks/0/gv01
Brick2: nfs02:/bricks/0/gv01
Options Reconfigured:
cluster.server-quorum-type: server
cluster.quorum-type: auto
server.event-threads: 8
client.event-threads: 8
performance.readdir-ahead: on
performance.write-behind-window-size: 8MB
performance.io-thread-count: 16
performance.cache-size: 1GB
nfs.trusted-sync: on
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
[root at nfs01 glusterfs]#

That's despite that I can confirm doing 90+ MB/s on my 1Gbe network. 
Thoughts?

-- 
Cheers,
Tom K.
-------------------------------------------------------------------------------------

Living on earth is expensive, but it includes a free trip around the sun.

> Hi,
> As I posted in my previous emails - glusterfs can never match NFS
(especially async one) performance of small files/latency. That's given by
the design.
> Nothing you can do about it.
> Ondrej
> 
> -----Original Message-----
> From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at
gluster.org] On Behalf Of Rik Theys
> Sent: Monday, March 19, 2018 10:38 AM
> To: gluster-users at gluster.org; mailinglists at smcleod.net
> Subject: Re: [Gluster-users] Gluster very poor performance when copying
small files (1x (2+1) = 3, SSD)
> 
> Hi,
> 
> I've done some similar tests and experience similar performance issues
(see my 'gluster for home directories?' thread on the list).
> 
> If I read your mail correctly, you are comparing an NFS mount of the brick
disk against a gluster mount (using the fuse client)?
> 
> Which options do you have set on the NFS export (sync or async)?
> 
>  From my tests, I concluded that the issue was not bandwidth but latency.
> Gluster will only return an IO operation once all bricks have confirmed
that the data is on disk. If you are using a fuse mount, you might compare with
using the 'direct-io-mode=disable' option on the client might help (no
experience with this).
> 
> In our tests, I've used NFS-ganesha to serve the gluster volume over
NFS. This makes things even worse as NFS-ganesha has no "async" mode,
which makes performance terrible.
> 
> If you find a magic knob to make glusterfs fast on small-file workloads, do
let me know!
> 
> Regards,
> 
> Rik
> 
> On 03/18/2018 11:13 PM, Sam McLeod wrote:
>> Howdy all,
>>
>> We're experiencing terrible small file performance when copying or
>> moving files on gluster clients.
>>
>> In the example below, Gluster is taking 6mins~ to copy 128MB / 21,000
>> files sideways on a client, doing the same thing on NFS (which I know
>> is a totally different solution etc. etc.) takes approximately 10-15
>> seconds(!).
>>
>> Any advice for tuning the volume or XFS settings would be greatly
>> appreciated.
>>
>> Hopefully I've included enough relevant information below.
>>
>>
>> ## Gluster Client
>>
>> root at gluster-client:/mnt/gluster_perf_test/ ?# du -sh .
>> 127M ? ?.
>> root at gluster-client:/mnt/gluster_perf_test/ ?# find . -type f | wc
-l
>> 21791
>> root at gluster-client:/mnt/gluster_perf_test/ ?# du 9584toto9584.txt
>> 4 ? ?9584toto9584.txt
>>
>>
>> root at gluster-client:/mnt/gluster_perf_test/ ?# time cp -a private
>> private_perf_test
>>
>> real ? ?5m51.862s
>> user ? ?0m0.862s
>> sys ? ?0m8.334s
>>
>> root at gluster-client:/mnt/gluster_perf_test/ # time rm -rf
>> private_perf_test/
>>
>> real ? ?0m49.702s
>> user ? ?0m0.087s
>> sys ? ?0m0.958s
>>
>>
>> ## Hosts
>>
>> - 16x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz per Gluster host /
>> client
>> - Storage: iSCSI provisioned (via 10Gbit DAC/Fibre), SSD disk, 50K
>> R/RW 4k IOP/s, 400MB/s per Gluster host
>> - Volumes are replicated across two hosts and one arbiter only host
>> - Networking is 10Gbit DAC/Fibre between Gluster hosts and clients
>> - 18GB DDR4 ECC memory
>>
>> ## Volume Info
>>
>> root at gluster-host-01:~ # gluster pool list UUID ? ? ? ? ?Hostname
>> State ad02970b-e2aa-4ca8-998c-bd10d5970faa ?gluster-host-02.fqdn
>> Connected ea116a94-c19e-48db-b108-0be3ae622e2e ?gluster-host-03.fqdn
>> Connected
>> 2e855c25-e7ac-4ff6-be85-e8bcc6f45ee4 ?localhost Connected
>>
>> root at gluster-host-01:~ # gluster volume info uat_storage
>>
>> Volume Name: uat_storage
>> Type: Replicate
>> Volume ID: 7918f1c5-5031-47b8-b054-56f6f0c569a2
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x (2 + 1) = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: gluster-host-01.fqdn:/mnt/gluster-storage/uat_storage
>> Brick2: gluster-host-02.fqdn:/mnt/gluster-storage/uat_storage
>> Brick3: gluster-host-03.fqdn:/mnt/gluster-storage/uat_storage
>> (arbiter) Options Reconfigured:
>> performance.rda-cache-limit: 256MB
>> network.inode-lru-limit: 50000
>> server.outstanding-rpc-limit: 256
>> performance.client-io-threads: true
>> nfs.disable: on
>> transport.address-family: inet
>> client.event-threads: 8
>> cluster.eager-lock: true
>> cluster.favorite-child-policy: size
>> cluster.lookup-optimize: true
>> cluster.readdir-optimize: true
>> cluster.use-compound-fops: true
>> diagnostics.brick-log-level: ERROR
>> diagnostics.client-log-level: ERROR
>> features.cache-invalidation-timeout: 600
>> features.cache-invalidation: true
>> network.ping-timeout: 15
>> performance.cache-invalidation: true
>> performance.cache-max-file-size: 6MB
>> performance.cache-refresh-timeout: 60
>> performance.cache-size: 1024MB
>> performance.io <http://performance.io>-thread-count: 16
>> performance.md-cache-timeout: 600
>> performance.stat-prefetch: true
>> performance.write-behind-window-size: 256MB
>> server.event-threads: 8
>> transport.listen-backlog: 2048
>>
>> root at gluster-host-01:~ # xfs_info
/dev/mapper/gluster-storage-unlocked
>> meta-data=/dev/mapper/gluster-storage-unlocked isize=512 ? ?agcount=4,
>> agsize=196607360 blks
>>  ? ? ? ? ?= ? ? ? ? ? ? ? ? ? ? ? sectsz=512 ? attr=2, projid32bit=1
>>  ? ? ? ? ?= ? ? ? ? ? ? ? ? ? ? ? crc=1 ? ? ? ?finobt=0 spinodes=0 data
>> = ? ? ? ? ? ? ? ? ? ? ? bsize=4096 ? blocks=786429440, imaxpct=5
>>  ? ? ? ? ?= ? ? ? ? ? ? ? ? ? ? ? sunit=0 ? ? ?swidth=0 blks naming
>> =version 2 ? ? ? ? ? ? ?bsize=8192 ? ascii-ci=0 ftype=1 log
>> =internal ? ? ? ? ? ? ? bsize=4096 ? blocks=383998, version=2
>>  ? ? ? ? ?= ? ? ? ? ? ? ? ? ? ? ? sectsz=512 ? sunit=0 blks,
>> lazy-count=1 realtime =none ? ? ? ? ? ? ? ? ? extsz=4096 ? blocks=0,
>> rtextents=0
>>
>>
>> --
>> Sam McLeod (protoporpoise on IRC)
>> https://smcleod.net
>> https://twitter.com/s_mcleod
>>
>> Words are my own opinions and do not?necessarily represent those of my
>> employer or partners.
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
> 
> 
> --
> Rik Theys
> System Engineer
> KU Leuven - Dept. Elektrotechniek (ESAT) Kasteelpark Arenberg 10 bus 2440 
- B-3001 Leuven-Heverlee
> +32(0)16/32.11.07
> ----------------------------------------------------------------
> <<Any errors in spelling, tact or fact are transmission
errors>> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
> -----
> 
> The information contained in this e-mail and in any attachments is
confidential and is designated solely for the attention of the intended
recipient(s). If you are not an intended recipient, you must not use, disclose,
copy, distribute or retain this e-mail or any part thereof. If you have received
this e-mail in error, please notify the sender by return e-mail and delete all
copies of this e-mail from your computer system(s). Please direct any additional
queries to: communications at s3group.com. Thank You. Silicon and Software
Systems Limited (S3 Group). Registered in Ireland no. 378073. Registered Office:
South County Business Park, Leopardstown, Dublin 18.
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
> 

-- 
Cheers,
Tom K.
-------------------------------------------------------------------------------------

Living on earth is expensive, but it includes a free trip around the sun.

Rik Theys

2018-Mar-19 14:52 UTC

head link

[Gluster-users] Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)

Hi,

On 03/19/2018 03:42 PM, TomK wrote:> On 3/19/2018 5:42 AM, Ondrej Valousek wrote:
> Removing NFS or NFS Ganesha from the equation, not very impressed on my
> own setup either.? For the writes it's doing, that's alot of CPU
usage
> in top. Seems bottle-necked via a single execution core somewhere trying
> to facilitate read / writes to the other bricks.
> 
> Writes to the gluster FS from within one of the gluster participating
> bricks:
> 
> [root at nfs01 n]# dd if=/dev/zero of=./some-file.bin
> 
> 393505+0 records in
> 393505+0 records out
> 201474560 bytes (201 MB) copied, 50.034 s, 4.0 MB/s
That's not really a fare comparison as you don't specify a blocksize.
What does

dd if=/dev/zero of=./some-file.bin bs=1M count=1000 oflag=direct

give?


Rik

-- 
Rik Theys
System Engineer
KU Leuven - Dept. Elektrotechniek (ESAT)
Kasteelpark Arenberg 10 bus 2440  - B-3001 Leuven-Heverlee
+32(0)16/32.11.07
----------------------------------------------------------------
<<Any errors in spelling, tact or fact are transmission errors>>

Possibly Parallel Threads

Search for more reasonably related threads

Gluster users - Mar 2018 - Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)

[Gluster-users] Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)

[Gluster-users] Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)

[Gluster-users] Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)

Possibly Parallel Threads