thr3ads.net - Gluster users - [Gluster-users] Performance is falling rapidly when updating from v5.5 to v7.0 [Nov 2019]

If this information is useful, please help other people find it:
Share via:

David Spisla

2019-Nov-06 10:12 UTC

[Gluster-users] Performance is falling rapidly when updating from v5.5 to v7.0

Hello Rafi,

I tried to set the xattr via

setfattr -n trusted.io-stats-dump -v '/tmp/iostat.log'
/gluster/repositories/repo1/

but it had no effect. There is no such a xattr via getfattr and no logfile.
The command setxattr is not available. What I am doing wrong?
By the way, you mean to increase the inode size of xfs layer from 512 Bytes
to 1024KB(!)? I think it should be 1024 Bytes because 2048 Bytes is the
maximum

Regards
David

Am Mi., 6. Nov. 2019 um 04:10 Uhr schrieb RAFI KC <rkavunga at
redhat.com>:
> I will take a look at the profile info shared. Since there is a huge
> difference in the performance numbers between fuse and samba, it would be
> great if we can get the profile info of fuse (on v7). This will help to
> compare the number of calls for each fops. There should be some fops that
> samba repeat, and we can find out it by comparing with fuse.
>
> Also if possible, can you please get client profile info from fuse mount
> using the command `setxattr -n trusted.io-stats-dump -v <logfile
> /tmp/iostat.log> </mnt/fuse(mount point)>`.
>
>
> Regards
>
> Rafi KC
>
> On 11/5/19 11:05 PM, David Spisla wrote:
>
> I did the test with Gluster 7.0 ctime disabled. But it had no effect:
> (All values in MiB/s)
> 64KiB    1MiB     10MiB
> 0,16       2,60       54,74
>
> Attached there is now the complete profile file also with the results from
> the last test. I will not repeat it with an higher inode size because I
> don't think this will have an effect.
> There must be another cause for the low performance
>
>
> Yes. No need to try with higher inode size
>
>
>
> Regards
> David Spisla
>
> Am Di., 5. Nov. 2019 um 16:25 Uhr schrieb David Spisla <spisla80 at
gmail.com
> >:
>
>>
>>
>> Am Di., 5. Nov. 2019 um 12:06 Uhr schrieb RAFI KC <rkavunga at
redhat.com>:
>>
>>>
>>> On 11/4/19 8:46 PM, David Spisla wrote:
>>>
>>> Dear Gluster Community,
>>>
>>> I also have a issue concerning performance. The last days I updated
our
>>> test cluster from GlusterFS v5.5 to v7.0 . The setup in general:
>>>
>>> 2 HP DL380 Servers with 10Gbit NICs, 1 Distribute-Replica 2 Volume
with
>>> 2 Replica Pairs. Client is SMB Samba (access via vfs_glusterfs) . I
did
>>> several tests to ensure that Samba don't causes the fall.
>>> The setup ist completely the same except the Gluster Version
>>> Here are my results:
>>> 64KiB           1MiB             10MiB            (Filesize)
>>> 3,49             47,41            300,50          (Values in MiB/s
with
>>> GlusterFS v5.5)
>>> 0,16              2,61             76,63            (Values in
MiB/s
>>> with GlusterFS v7.0)
>>>
>>>
>>> Can you please share the profile information [1] for both versions?
>>> Also it would be really helpful if you can mention the io patterns
that
>>> used for this tests.
>>>
>>> [1] :
>>>
https://docs.gluster.org/en/latest/Administrator%20Guide/Monitoring%20Workload/
>>>
>> Hello Rafi,
>> thank you for your help.
>>
>> * First more information about the io patterns: As a client we use a
>> DL360 Windws Server 2017 machine with 10Gbit NIC connected to the
storage
>> machines. The share will be mounted via SMB and the tests writes with
fio.
>> We use this job files (see attachment). Each job file will be executed
>> separetely and there is a sleep about 60s between each test run to calm
>> down the system before starting a new test.
>>
>> * Attached below you find the profile output from the tests with v5.5
>> (ctime enabled), v7.0 (ctime enabled).
>>
>> * Beside of the tests with Samba I did also some fio tests directly on
>> the FUSE Mounts (locally on one of the storage nodes). The results show
>> that there is only a small decrease of performance between v5.5 and
v7.0
>> (All values in MiB/s)
>> 64KiB    1MiB     10MiB
>> 50,09     679,96   1023,02 (v5.5)
>> 47,00     656,46    977,60 (v7.0)
>>
>> It seems to be that the combination of samba + gluster7.0 has a lot of
>> problems, or not?
>>
>>
>>>
>>> We use this volume options (GlusterFS 7.0):
>>>
>>> Volume Name: archive1
>>> Type: Distributed-Replicate
>>> Volume ID: 44c17844-0bd4-4ca2-98d8-a1474add790c
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 2 x 2 = 4
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: fs-dl380-c1-n1:/gluster/brick1/glusterbrick
>>> Brick2: fs-dl380-c1-n2:/gluster/brick1/glusterbrick
>>> Brick3: fs-dl380-c1-n1:/gluster/brick2/glusterbrick
>>> Brick4: fs-dl380-c1-n2:/gluster/brick2/glusterbrick
>>> Options Reconfigured:
>>> performance.client-io-threads: off
>>> nfs.disable: on
>>> storage.fips-mode-rchecksum: on
>>> transport.address-family: inet
>>> user.smb: disable
>>> features.read-only: off
>>> features.worm: off
>>> features.worm-file-level: on
>>> features.retention-mode: enterprise
>>> features.default-retention-period: 120
>>> network.ping-timeout: 10
>>> features.cache-invalidation: on
>>> features.cache-invalidation-timeout: 600
>>> performance.nl-cache: on
>>> performance.nl-cache-timeout: 600
>>> client.event-threads: 32
>>> server.event-threads: 32
>>> cluster.lookup-optimize: on
>>> performance.stat-prefetch: on
>>> performance.cache-invalidation: on
>>> performance.md-cache-timeout: 600
>>> performance.cache-samba-metadata: on
>>> performance.cache-ima-xattrs: on
>>> performance.io-thread-count: 64
>>> cluster.use-compound-fops: on
>>> performance.cache-size: 512MB
>>> performance.cache-refresh-timeout: 10
>>> performance.read-ahead: off
>>> performance.write-behind-window-size: 4MB
>>> performance.write-behind: on
>>> storage.build-pgfid: on
>>> features.ctime: on
>>> cluster.quorum-type: fixed
>>> cluster.quorum-count: 1
>>> features.bitrot: on
>>> features.scrub: Active
>>> features.scrub-freq: daily
>>>
>>> For GlusterFS 5.5 its nearly the same except the fact that there
were 2
>>> options to enable ctime feature.
>>>
>>>
>>>
>>> Ctime stores additional metadata information as an extended
attributes
>>> which sometimes exceeds the default inode size. In such scenarios
the
>>> additional xattrs won't fit into the default size. This will
result in
>>> additional blocks to be used to store xattrs in the inide, which
will
>>> effect the latency. This is purely based on the i/o operations and
the
>>> total xattrs size stored in the inode.
>>>
>>> Is it possible for you to repeat the test by disabling ctime or
>>> increasing the inode size to a higher value say 1024KB?
>>>
>> I will do so but for today I could not finish tests with ctime disabled
>> (or higher inode value) because it takes a lot of time with v7.0 due to
the
>> low performance and I will perform it tomorrow. As soon as possible I
give
>> you the results.
>> By the way: You really mean inode size on xfs layer 1024KB? Or do you
>> mean 1024Bytes? We use per default 512Bytes, because this is the
>> recommended size until now . But it seems to be that there is a need
for a
>> new recommendation when using ctime feature as a default. I can not
image
>> that this is the real cause for the low performance because in v5.5 we
also
>> use ctime feature with inode size 512Bytes.
>>
>> Regards
>> David
>>
>>>
>>> Our optimization for Samba looks like this (for every version):
>>>
>>> [global]
>>> workgroup = SAMBA
>>> netbios name = CLUSTER
>>> kernel share modes = no
>>> aio read size = 1
>>> aio write size = 1
>>> kernel oplocks = no
>>> max open files = 100000
>>> nt acl support = no
>>> security = user
>>> server min protocol = SMB2
>>> store dos attributes = no
>>> strict locking = no
>>> full_audit:failure = pwrite_send pwrite_recv pwrite
offload_write_send
>>> offload_write_recv create_file open unlink connect disconnect
rename chown
>>> fchown lchown chmod fchmod mkdir rmdir ntimes ftruncate fallocate
>>> full_audit:success = pwrite_send pwrite_recv pwrite
offload_write_send
>>> offload_write_recv create_file open unlink connect disconnect
rename chown
>>> fchown lchown chmod fchmod mkdir rmdir ntimes ftruncate fallocate
>>> full_audit:facility = local5
>>> durable handles = yes
>>> posix locking = no
>>> log level = 2
>>> max log size = 100000
>>> debug pid = yes
>>>
>>> What can be the cause for this rapid falling of the performance for
>>> small files? Are some of our vol options not recommended anymore?
>>> There were some patches concerning performance for small files in
v6.0
>>> und v7.0 :
>>>
>>> #1670031 <https://bugzilla.redhat.com/1670031>: performance
regression
>>> seen with smallfile workload tests
>>>
>>> #1659327 <https://bugzilla.redhat.com/1659327>: 43%
regression in
>>> small-file sequential read performance
>>>
>>> And one patch for the io-cache:
>>>
>>> #1659869 <https://bugzilla.redhat.com/1659869>: improvements
to io-cache
>>>
>>> Regards
>>>
>>> David Spisla
>>>
>>>
>>> ________
>>>
>>> Community Meeting Calendar:
>>>
>>> APAC Schedule -
>>> Every 2nd and 4th Tuesday at 11:30 AM IST
>>> Bridge: https://bluejeans.com/118564314
>>>
>>> NA/EMEA Schedule -
>>> Every 1st and 3rd Tuesday at 01:00 PM EDT
>>> Bridge: https://bluejeans.com/118564314
>>>
>>> Gluster-users mailing listGluster-users at
gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20191106/89324bd1/attachment.html>

RAFI KC

2019-Nov-06 10:16 UTC

head link

[Gluster-users] Performance is falling rapidly when updating from v5.5 to v7.0

On 11/6/19 3:42 PM, David Spisla wrote:> Hello Rafi,
>
> I tried to set the xattr via
>
> setfattr -n trusted.io-stats-dump -v '/tmp/iostat.log' 
> /gluster/repositories/repo1/
>
> but it had no effect. There is no such a xattr via getfattr and no 
> logfile. The command setxattr is not available. What I am doing wrong?

I will check it out and get back to you.

> By the way, you mean to increase the inode size of xfs layer from 512 
> Bytes to 1024KB(!)? I think it should be 1024 Bytes because 2048 Bytes 
> is the maximumIt was a type, I meant to set up 1024 bytes, sorry for
that.>
> Regards
> David
>
> Am Mi., 6. Nov. 2019 um 04:10?Uhr schrieb RAFI KC <rkavunga at
redhat.com
> <mailto:rkavunga at redhat.com>>:
>
>     I will take a look at the profile info shared. Since there is a
>     huge difference in the performance numbers between fuse and samba,
>     it would be great if we can get the profile info of fuse (on v7).
>     This will help to compare the number of calls for each fops. There
>     should be some fops that samba repeat, and we can find out it by
>     comparing with fuse.
>
>     Also if possible, can you please get client profile info from fuse
>     mount using the command `setxattr -n trusted.io-stats-dump -v
>     <logfile /tmp/iostat.log> </mnt/fuse(mount point)>`.
>
>
>     Regards
>
>     Rafi KC
>
>
>     On 11/5/19 11:05 PM, David Spisla wrote:
>>     I did the test with Gluster 7.0 ctime disabled. But it had no
effect:
>>     (All values in MiB/s)
>>     64KiB? ? 1MiB?? ? 10MiB
>>     0,16?????? 2,60?????? 54,74
>>
>>     Attached there is now the complete profile file also with the
>>     results from the last test. I will not repeat it with an higher
>>     inode size because I don't think this will have an effect.
>>     There must be another cause for the low performance
>
>
>     Yes. No need to try with higher inode size
>
>
>>
>>     Regards
>>     David Spisla
>>
>>     Am Di., 5. Nov. 2019 um 16:25?Uhr schrieb David Spisla
>>     <spisla80 at gmail.com <mailto:spisla80 at gmail.com>>:
>>
>>
>>
>>         Am Di., 5. Nov. 2019 um 12:06?Uhr schrieb RAFI KC
>>         <rkavunga at redhat.com <mailto:rkavunga at
redhat.com>>:
>>
>>
>>             On 11/4/19 8:46 PM, David Spisla wrote:
>>>             Dear Gluster Community,
>>>
>>>             I also have a issue concerning performance. The last
>>>             days I updated our test cluster from GlusterFS v5.5 to
>>>             v7.0 . The setup in general:
>>>
>>>             2 HP DL380 Servers with 10Gbit NICs, 1
>>>             Distribute-Replica 2 Volume with 2 Replica Pairs.
Client
>>>             is SMB Samba (access via vfs_glusterfs) . I did several
>>>             tests to ensure that Samba don't causes the fall.
>>>             The setup ist completely the same except the Gluster
Version
>>>             Here are my results:
>>>             64KiB???? ? ??? 1MiB 10MiB ? ? ? ? ?? (Filesize)
>>>             3,49? ? ? ? ? ?? 47,41 ? ? 300,50????????? (Values in
>>>             MiB/s with GlusterFS v5.5)
>>>             0,16? ? ? ? ? ? ? 2,61 ?????????? 76,63 (Values in
MiB/s
>>>             with GlusterFS v7.0)
>>
>>
>>             Can you please share the profile information [1] for both
>>             versions?? Also it would be really helpful if you can
>>             mention the io patterns that used for this tests.
>>
>>             [1] :
>>            
https://docs.gluster.org/en/latest/Administrator%20Guide/Monitoring%20Workload/
>>
>>         Hello Rafi,
>>         thank you for your help.
>>
>>         * First more information about the io patterns: As a client
>>         we use a DL360 Windws Server 2017 machine with 10Gbit NIC
>>         connected to the storage machines. The share will be mounted
>>         via SMB and the tests writes with fio. We use this job files
>>         (see attachment). Each job file will be executed separetely
>>         and there is a sleep about 60s between each test run to calm
>>         down the system before starting a new test.
>>
>>         * Attached below you find the profile output from the tests
>>         with v5.5 (ctime enabled), v7.0 (ctime enabled).
>>
>>         * Beside of the tests with Samba I did also some fio tests
>>         directly on the FUSE Mounts (locally on one of the storage
>>         nodes). The results show that there is only a small decrease
>>         of performance between v5.5 and v7.0
>>         (All values in MiB/s)
>>         64KiB? ? 1MiB?? ? 10MiB
>>         50,09???? 679,96?? 1023,02 (v5.5)
>>         47,00???? 656,46??? 977,60 (v7.0)
>>
>>         It seems to be that the combination of samba + gluster7.0 has
>>         a lot of problems, or not?
>>
>>
>>>
>>>             We use this volume options (GlusterFS 7.0):
>>>
>>>             Volume Name: archive1
>>>             Type: Distributed-Replicate
>>>             Volume ID: 44c17844-0bd4-4ca2-98d8-a1474add790c
>>>             Status: Started
>>>             Snapshot Count: 0
>>>             Number of Bricks: 2 x 2 = 4
>>>             Transport-type: tcp
>>>             Bricks:
>>>             Brick1: fs-dl380-c1-n1:/gluster/brick1/glusterbrick
>>>             Brick2: fs-dl380-c1-n2:/gluster/brick1/glusterbrick
>>>             Brick3: fs-dl380-c1-n1:/gluster/brick2/glusterbrick
>>>             Brick4: fs-dl380-c1-n2:/gluster/brick2/glusterbrick
>>>             Options Reconfigured:
>>>             performance.client-io-threads: off
>>>             nfs.disable: on
>>>             storage.fips-mode-rchecksum: on
>>>             transport.address-family: inet
>>>             user.smb: disable
>>>             features.read-only: off
>>>             features.worm: off
>>>             features.worm-file-level: on
>>>             features.retention-mode: enterprise
>>>             features.default-retention-period: 120
>>>             network.ping-timeout: 10
>>>             features.cache-invalidation: on
>>>             features.cache-invalidation-timeout: 600
>>>             performance.nl-cache: on
>>>             performance.nl-cache-timeout: 600
>>>             client.event-threads: 32
>>>             server.event-threads: 32
>>>             cluster.lookup-optimize: on
>>>             performance.stat-prefetch: on
>>>             performance.cache-invalidation: on
>>>             performance.md-cache-timeout: 600
>>>             performance.cache-samba-metadata: on
>>>             performance.cache-ima-xattrs: on
>>>             performance.io-thread-count: 64
>>>             cluster.use-compound-fops: on
>>>             performance.cache-size: 512MB
>>>             performance.cache-refresh-timeout: 10
>>>             performance.read-ahead: off
>>>             performance.write-behind-window-size: 4MB
>>>             performance.write-behind: on
>>>             storage.build-pgfid: on
>>>             features.ctime: on
>>>             cluster.quorum-type: fixed
>>>             cluster.quorum-count: 1
>>>             features.bitrot: on
>>>             features.scrub: Active
>>>             features.scrub-freq: daily
>>>
>>>             For GlusterFS 5.5 its nearly the same except the fact
>>>             that there were 2 options to enable ctime feature.
>>
>>
>>
>>             Ctime stores additional metadata information as an
>>             extended attributes which sometimes exceeds the default
>>             inode size. In such scenarios the additional xattrs
won't
>>             fit into the default size. This will result in additional
>>             blocks to be used to store xattrs in the inide, which
>>             will effect the latency. This is purely based on the i/o
>>             operations and the total xattrs size stored in the inode.
>>
>>             Is it possible for you to repeat the test by disabling
>>             ctime or increasing the inode size to a higher value say
>>             1024KB?
>>
>>         I will do so but for today I could not finish tests with
>>         ctime disabled (or higher inode value) because it takes a lot
>>         of time with v7.0 due to the low performance and I will
>>         perform it tomorrow. As soon as possible I give you the
results.
>>         By the way: You really mean inode size on xfs layer 1024KB?
>>         Or do you mean 1024Bytes? We use per default 512Bytes,
>>         because this is the recommended size until now . But it seems
>>         to be that there is a need for a new recommendation when
>>         using ctime feature as a default. I can not image that this
>>         is the real cause for the low performance because in v5.5 we
>>         also use ctime feature with inode size 512Bytes.
>>
>>         Regards
>>         David
>>
>>
>>>             Our optimization for Samba looks like this (for every
>>>             version):
>>>
>>>             [global]
>>>             workgroup = SAMBA
>>>             netbios name = CLUSTER
>>>             kernel share modes = no
>>>             aio read size = 1
>>>             aio write size = 1
>>>             kernel oplocks = no
>>>             max open files = 100000
>>>             nt acl support = no
>>>             security = user
>>>             server min protocol = SMB2
>>>             store dos attributes = no
>>>             strict locking = no
>>>             full_audit:failure = pwrite_send pwrite_recv pwrite
>>>             offload_write_send offload_write_recv create_file open
>>>             unlink connect disconnect rename chown fchown lchown
>>>             chmod fchmod mkdir rmdir ntimes ftruncate fallocate
>>>             full_audit:success = pwrite_send pwrite_recv pwrite
>>>             offload_write_send offload_write_recv create_file open
>>>             unlink connect disconnect rename chown fchown lchown
>>>             chmod fchmod mkdir rmdir ntimes ftruncate fallocate
>>>             full_audit:facility = local5
>>>             durable handles = yes
>>>             posix locking = no
>>>             log level = 2
>>>             max log size = 100000
>>>             debug pid = yes
>>>
>>>             What can be the cause for this rapid falling of the
>>>             performance for small files? Are some of our vol
options
>>>             not recommended anymore?
>>>             There were some patches concerning performance for
small
>>>             files in v6.0 und v7.0 :
>>>
>>>             #1670031 <https://bugzilla.redhat.com/1670031>:
>>>             performance regression seen with smallfile workload
tests
>>>
>>>             #1659327 <https://bugzilla.redhat.com/1659327>:
43%
>>>             regression in small-file sequential read performance
>>>
>>>             And one patch for the io-cache:
>>>
>>>             #1659869 <https://bugzilla.redhat.com/1659869>:
>>>             improvements to io-cache
>>>
>>>             Regards
>>>
>>>             David Spisla
>>>
>>>
>>>             ________
>>>
>>>             Community Meeting Calendar:
>>>
>>>             APAC Schedule -
>>>             Every 2nd and 4th Tuesday at 11:30 AM IST
>>>             Bridge:https://bluejeans.com/118564314
>>>
>>>             NA/EMEA Schedule -
>>>             Every 1st and 3rd Tuesday at 01:00 PM EDT
>>>             Bridge:https://bluejeans.com/118564314
>>>
>>>             Gluster-users mailing list
>>>             Gluster-users at gluster.org  <mailto:Gluster-users
at gluster.org>
>>>            
https://lists.gluster.org/mailman/listinfo/gluster-users
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20191106/bc87238f/attachment.html>

Gluster users - Nov 2019 - Performance is falling rapidly when updating from v5.5 to v7.0

[Gluster-users] Performance is falling rapidly when updating from v5.5 to v7.0

[Gluster-users] Performance is falling rapidly when updating from v5.5 to v7.0