David Spisla
2019-Nov-06 10:12 UTC
[Gluster-users] Performance is falling rapidly when updating from v5.5 to v7.0
Hello Rafi, I tried to set the xattr via setfattr -n trusted.io-stats-dump -v '/tmp/iostat.log' /gluster/repositories/repo1/ but it had no effect. There is no such a xattr via getfattr and no logfile. The command setxattr is not available. What I am doing wrong? By the way, you mean to increase the inode size of xfs layer from 512 Bytes to 1024KB(!)? I think it should be 1024 Bytes because 2048 Bytes is the maximum Regards David Am Mi., 6. Nov. 2019 um 04:10 Uhr schrieb RAFI KC <rkavunga at redhat.com>:> I will take a look at the profile info shared. Since there is a huge > difference in the performance numbers between fuse and samba, it would be > great if we can get the profile info of fuse (on v7). This will help to > compare the number of calls for each fops. There should be some fops that > samba repeat, and we can find out it by comparing with fuse. > > Also if possible, can you please get client profile info from fuse mount > using the command `setxattr -n trusted.io-stats-dump -v <logfile > /tmp/iostat.log> </mnt/fuse(mount point)>`. > > > Regards > > Rafi KC > > On 11/5/19 11:05 PM, David Spisla wrote: > > I did the test with Gluster 7.0 ctime disabled. But it had no effect: > (All values in MiB/s) > 64KiB 1MiB 10MiB > 0,16 2,60 54,74 > > Attached there is now the complete profile file also with the results from > the last test. I will not repeat it with an higher inode size because I > don't think this will have an effect. > There must be another cause for the low performance > > > Yes. No need to try with higher inode size > > > > Regards > David Spisla > > Am Di., 5. Nov. 2019 um 16:25 Uhr schrieb David Spisla <spisla80 at gmail.com > >: > >> >> >> Am Di., 5. Nov. 2019 um 12:06 Uhr schrieb RAFI KC <rkavunga at redhat.com>: >> >>> >>> On 11/4/19 8:46 PM, David Spisla wrote: >>> >>> Dear Gluster Community, >>> >>> I also have a issue concerning performance. The last days I updated our >>> test cluster from GlusterFS v5.5 to v7.0 . The setup in general: >>> >>> 2 HP DL380 Servers with 10Gbit NICs, 1 Distribute-Replica 2 Volume with >>> 2 Replica Pairs. Client is SMB Samba (access via vfs_glusterfs) . I did >>> several tests to ensure that Samba don't causes the fall. >>> The setup ist completely the same except the Gluster Version >>> Here are my results: >>> 64KiB 1MiB 10MiB (Filesize) >>> 3,49 47,41 300,50 (Values in MiB/s with >>> GlusterFS v5.5) >>> 0,16 2,61 76,63 (Values in MiB/s >>> with GlusterFS v7.0) >>> >>> >>> Can you please share the profile information [1] for both versions? >>> Also it would be really helpful if you can mention the io patterns that >>> used for this tests. >>> >>> [1] : >>> https://docs.gluster.org/en/latest/Administrator%20Guide/Monitoring%20Workload/ >>> >> Hello Rafi, >> thank you for your help. >> >> * First more information about the io patterns: As a client we use a >> DL360 Windws Server 2017 machine with 10Gbit NIC connected to the storage >> machines. The share will be mounted via SMB and the tests writes with fio. >> We use this job files (see attachment). Each job file will be executed >> separetely and there is a sleep about 60s between each test run to calm >> down the system before starting a new test. >> >> * Attached below you find the profile output from the tests with v5.5 >> (ctime enabled), v7.0 (ctime enabled). >> >> * Beside of the tests with Samba I did also some fio tests directly on >> the FUSE Mounts (locally on one of the storage nodes). The results show >> that there is only a small decrease of performance between v5.5 and v7.0 >> (All values in MiB/s) >> 64KiB 1MiB 10MiB >> 50,09 679,96 1023,02 (v5.5) >> 47,00 656,46 977,60 (v7.0) >> >> It seems to be that the combination of samba + gluster7.0 has a lot of >> problems, or not? >> >> >>> >>> We use this volume options (GlusterFS 7.0): >>> >>> Volume Name: archive1 >>> Type: Distributed-Replicate >>> Volume ID: 44c17844-0bd4-4ca2-98d8-a1474add790c >>> Status: Started >>> Snapshot Count: 0 >>> Number of Bricks: 2 x 2 = 4 >>> Transport-type: tcp >>> Bricks: >>> Brick1: fs-dl380-c1-n1:/gluster/brick1/glusterbrick >>> Brick2: fs-dl380-c1-n2:/gluster/brick1/glusterbrick >>> Brick3: fs-dl380-c1-n1:/gluster/brick2/glusterbrick >>> Brick4: fs-dl380-c1-n2:/gluster/brick2/glusterbrick >>> Options Reconfigured: >>> performance.client-io-threads: off >>> nfs.disable: on >>> storage.fips-mode-rchecksum: on >>> transport.address-family: inet >>> user.smb: disable >>> features.read-only: off >>> features.worm: off >>> features.worm-file-level: on >>> features.retention-mode: enterprise >>> features.default-retention-period: 120 >>> network.ping-timeout: 10 >>> features.cache-invalidation: on >>> features.cache-invalidation-timeout: 600 >>> performance.nl-cache: on >>> performance.nl-cache-timeout: 600 >>> client.event-threads: 32 >>> server.event-threads: 32 >>> cluster.lookup-optimize: on >>> performance.stat-prefetch: on >>> performance.cache-invalidation: on >>> performance.md-cache-timeout: 600 >>> performance.cache-samba-metadata: on >>> performance.cache-ima-xattrs: on >>> performance.io-thread-count: 64 >>> cluster.use-compound-fops: on >>> performance.cache-size: 512MB >>> performance.cache-refresh-timeout: 10 >>> performance.read-ahead: off >>> performance.write-behind-window-size: 4MB >>> performance.write-behind: on >>> storage.build-pgfid: on >>> features.ctime: on >>> cluster.quorum-type: fixed >>> cluster.quorum-count: 1 >>> features.bitrot: on >>> features.scrub: Active >>> features.scrub-freq: daily >>> >>> For GlusterFS 5.5 its nearly the same except the fact that there were 2 >>> options to enable ctime feature. >>> >>> >>> >>> Ctime stores additional metadata information as an extended attributes >>> which sometimes exceeds the default inode size. In such scenarios the >>> additional xattrs won't fit into the default size. This will result in >>> additional blocks to be used to store xattrs in the inide, which will >>> effect the latency. This is purely based on the i/o operations and the >>> total xattrs size stored in the inode. >>> >>> Is it possible for you to repeat the test by disabling ctime or >>> increasing the inode size to a higher value say 1024KB? >>> >> I will do so but for today I could not finish tests with ctime disabled >> (or higher inode value) because it takes a lot of time with v7.0 due to the >> low performance and I will perform it tomorrow. As soon as possible I give >> you the results. >> By the way: You really mean inode size on xfs layer 1024KB? Or do you >> mean 1024Bytes? We use per default 512Bytes, because this is the >> recommended size until now . But it seems to be that there is a need for a >> new recommendation when using ctime feature as a default. I can not image >> that this is the real cause for the low performance because in v5.5 we also >> use ctime feature with inode size 512Bytes. >> >> Regards >> David >> >>> >>> Our optimization for Samba looks like this (for every version): >>> >>> [global] >>> workgroup = SAMBA >>> netbios name = CLUSTER >>> kernel share modes = no >>> aio read size = 1 >>> aio write size = 1 >>> kernel oplocks = no >>> max open files = 100000 >>> nt acl support = no >>> security = user >>> server min protocol = SMB2 >>> store dos attributes = no >>> strict locking = no >>> full_audit:failure = pwrite_send pwrite_recv pwrite offload_write_send >>> offload_write_recv create_file open unlink connect disconnect rename chown >>> fchown lchown chmod fchmod mkdir rmdir ntimes ftruncate fallocate >>> full_audit:success = pwrite_send pwrite_recv pwrite offload_write_send >>> offload_write_recv create_file open unlink connect disconnect rename chown >>> fchown lchown chmod fchmod mkdir rmdir ntimes ftruncate fallocate >>> full_audit:facility = local5 >>> durable handles = yes >>> posix locking = no >>> log level = 2 >>> max log size = 100000 >>> debug pid = yes >>> >>> What can be the cause for this rapid falling of the performance for >>> small files? Are some of our vol options not recommended anymore? >>> There were some patches concerning performance for small files in v6.0 >>> und v7.0 : >>> >>> #1670031 <https://bugzilla.redhat.com/1670031>: performance regression >>> seen with smallfile workload tests >>> >>> #1659327 <https://bugzilla.redhat.com/1659327>: 43% regression in >>> small-file sequential read performance >>> >>> And one patch for the io-cache: >>> >>> #1659869 <https://bugzilla.redhat.com/1659869>: improvements to io-cache >>> >>> Regards >>> >>> David Spisla >>> >>> >>> ________ >>> >>> Community Meeting Calendar: >>> >>> APAC Schedule - >>> Every 2nd and 4th Tuesday at 11:30 AM IST >>> Bridge: https://bluejeans.com/118564314 >>> >>> NA/EMEA Schedule - >>> Every 1st and 3rd Tuesday at 01:00 PM EDT >>> Bridge: https://bluejeans.com/118564314 >>> >>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users >>> >>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20191106/89324bd1/attachment.html>
RAFI KC
2019-Nov-06 10:16 UTC
[Gluster-users] Performance is falling rapidly when updating from v5.5 to v7.0
On 11/6/19 3:42 PM, David Spisla wrote:> Hello Rafi, > > I tried to set the xattr via > > setfattr -n trusted.io-stats-dump -v '/tmp/iostat.log' > /gluster/repositories/repo1/ > > but it had no effect. There is no such a xattr via getfattr and no > logfile. The command setxattr is not available. What I am doing wrong?I will check it out and get back to you.> By the way, you mean to increase the inode size of xfs layer from 512 > Bytes to 1024KB(!)? I think it should be 1024 Bytes because 2048 Bytes > is the maximumIt was a type, I meant to set up 1024 bytes, sorry for that.> > Regards > David > > Am Mi., 6. Nov. 2019 um 04:10?Uhr schrieb RAFI KC <rkavunga at redhat.com > <mailto:rkavunga at redhat.com>>: > > I will take a look at the profile info shared. Since there is a > huge difference in the performance numbers between fuse and samba, > it would be great if we can get the profile info of fuse (on v7). > This will help to compare the number of calls for each fops. There > should be some fops that samba repeat, and we can find out it by > comparing with fuse. > > Also if possible, can you please get client profile info from fuse > mount using the command `setxattr -n trusted.io-stats-dump -v > <logfile /tmp/iostat.log> </mnt/fuse(mount point)>`. > > > Regards > > Rafi KC > > > On 11/5/19 11:05 PM, David Spisla wrote: >> I did the test with Gluster 7.0 ctime disabled. But it had no effect: >> (All values in MiB/s) >> 64KiB? ? 1MiB?? ? 10MiB >> 0,16?????? 2,60?????? 54,74 >> >> Attached there is now the complete profile file also with the >> results from the last test. I will not repeat it with an higher >> inode size because I don't think this will have an effect. >> There must be another cause for the low performance > > > Yes. No need to try with higher inode size > > >> >> Regards >> David Spisla >> >> Am Di., 5. Nov. 2019 um 16:25?Uhr schrieb David Spisla >> <spisla80 at gmail.com <mailto:spisla80 at gmail.com>>: >> >> >> >> Am Di., 5. Nov. 2019 um 12:06?Uhr schrieb RAFI KC >> <rkavunga at redhat.com <mailto:rkavunga at redhat.com>>: >> >> >> On 11/4/19 8:46 PM, David Spisla wrote: >>> Dear Gluster Community, >>> >>> I also have a issue concerning performance. The last >>> days I updated our test cluster from GlusterFS v5.5 to >>> v7.0 . The setup in general: >>> >>> 2 HP DL380 Servers with 10Gbit NICs, 1 >>> Distribute-Replica 2 Volume with 2 Replica Pairs. Client >>> is SMB Samba (access via vfs_glusterfs) . I did several >>> tests to ensure that Samba don't causes the fall. >>> The setup ist completely the same except the Gluster Version >>> Here are my results: >>> 64KiB???? ? ??? 1MiB 10MiB ? ? ? ? ?? (Filesize) >>> 3,49? ? ? ? ? ?? 47,41 ? ? 300,50????????? (Values in >>> MiB/s with GlusterFS v5.5) >>> 0,16? ? ? ? ? ? ? 2,61 ?????????? 76,63 (Values in MiB/s >>> with GlusterFS v7.0) >> >> >> Can you please share the profile information [1] for both >> versions?? Also it would be really helpful if you can >> mention the io patterns that used for this tests. >> >> [1] : >> https://docs.gluster.org/en/latest/Administrator%20Guide/Monitoring%20Workload/ >> >> Hello Rafi, >> thank you for your help. >> >> * First more information about the io patterns: As a client >> we use a DL360 Windws Server 2017 machine with 10Gbit NIC >> connected to the storage machines. The share will be mounted >> via SMB and the tests writes with fio. We use this job files >> (see attachment). Each job file will be executed separetely >> and there is a sleep about 60s between each test run to calm >> down the system before starting a new test. >> >> * Attached below you find the profile output from the tests >> with v5.5 (ctime enabled), v7.0 (ctime enabled). >> >> * Beside of the tests with Samba I did also some fio tests >> directly on the FUSE Mounts (locally on one of the storage >> nodes). The results show that there is only a small decrease >> of performance between v5.5 and v7.0 >> (All values in MiB/s) >> 64KiB? ? 1MiB?? ? 10MiB >> 50,09???? 679,96?? 1023,02 (v5.5) >> 47,00???? 656,46??? 977,60 (v7.0) >> >> It seems to be that the combination of samba + gluster7.0 has >> a lot of problems, or not? >> >> >>> >>> We use this volume options (GlusterFS 7.0): >>> >>> Volume Name: archive1 >>> Type: Distributed-Replicate >>> Volume ID: 44c17844-0bd4-4ca2-98d8-a1474add790c >>> Status: Started >>> Snapshot Count: 0 >>> Number of Bricks: 2 x 2 = 4 >>> Transport-type: tcp >>> Bricks: >>> Brick1: fs-dl380-c1-n1:/gluster/brick1/glusterbrick >>> Brick2: fs-dl380-c1-n2:/gluster/brick1/glusterbrick >>> Brick3: fs-dl380-c1-n1:/gluster/brick2/glusterbrick >>> Brick4: fs-dl380-c1-n2:/gluster/brick2/glusterbrick >>> Options Reconfigured: >>> performance.client-io-threads: off >>> nfs.disable: on >>> storage.fips-mode-rchecksum: on >>> transport.address-family: inet >>> user.smb: disable >>> features.read-only: off >>> features.worm: off >>> features.worm-file-level: on >>> features.retention-mode: enterprise >>> features.default-retention-period: 120 >>> network.ping-timeout: 10 >>> features.cache-invalidation: on >>> features.cache-invalidation-timeout: 600 >>> performance.nl-cache: on >>> performance.nl-cache-timeout: 600 >>> client.event-threads: 32 >>> server.event-threads: 32 >>> cluster.lookup-optimize: on >>> performance.stat-prefetch: on >>> performance.cache-invalidation: on >>> performance.md-cache-timeout: 600 >>> performance.cache-samba-metadata: on >>> performance.cache-ima-xattrs: on >>> performance.io-thread-count: 64 >>> cluster.use-compound-fops: on >>> performance.cache-size: 512MB >>> performance.cache-refresh-timeout: 10 >>> performance.read-ahead: off >>> performance.write-behind-window-size: 4MB >>> performance.write-behind: on >>> storage.build-pgfid: on >>> features.ctime: on >>> cluster.quorum-type: fixed >>> cluster.quorum-count: 1 >>> features.bitrot: on >>> features.scrub: Active >>> features.scrub-freq: daily >>> >>> For GlusterFS 5.5 its nearly the same except the fact >>> that there were 2 options to enable ctime feature. >> >> >> >> Ctime stores additional metadata information as an >> extended attributes which sometimes exceeds the default >> inode size. In such scenarios the additional xattrs won't >> fit into the default size. This will result in additional >> blocks to be used to store xattrs in the inide, which >> will effect the latency. This is purely based on the i/o >> operations and the total xattrs size stored in the inode. >> >> Is it possible for you to repeat the test by disabling >> ctime or increasing the inode size to a higher value say >> 1024KB? >> >> I will do so but for today I could not finish tests with >> ctime disabled (or higher inode value) because it takes a lot >> of time with v7.0 due to the low performance and I will >> perform it tomorrow. As soon as possible I give you the results. >> By the way: You really mean inode size on xfs layer 1024KB? >> Or do you mean 1024Bytes? We use per default 512Bytes, >> because this is the recommended size until now . But it seems >> to be that there is a need for a new recommendation when >> using ctime feature as a default. I can not image that this >> is the real cause for the low performance because in v5.5 we >> also use ctime feature with inode size 512Bytes. >> >> Regards >> David >> >> >>> Our optimization for Samba looks like this (for every >>> version): >>> >>> [global] >>> workgroup = SAMBA >>> netbios name = CLUSTER >>> kernel share modes = no >>> aio read size = 1 >>> aio write size = 1 >>> kernel oplocks = no >>> max open files = 100000 >>> nt acl support = no >>> security = user >>> server min protocol = SMB2 >>> store dos attributes = no >>> strict locking = no >>> full_audit:failure = pwrite_send pwrite_recv pwrite >>> offload_write_send offload_write_recv create_file open >>> unlink connect disconnect rename chown fchown lchown >>> chmod fchmod mkdir rmdir ntimes ftruncate fallocate >>> full_audit:success = pwrite_send pwrite_recv pwrite >>> offload_write_send offload_write_recv create_file open >>> unlink connect disconnect rename chown fchown lchown >>> chmod fchmod mkdir rmdir ntimes ftruncate fallocate >>> full_audit:facility = local5 >>> durable handles = yes >>> posix locking = no >>> log level = 2 >>> max log size = 100000 >>> debug pid = yes >>> >>> What can be the cause for this rapid falling of the >>> performance for small files? Are some of our vol options >>> not recommended anymore? >>> There were some patches concerning performance for small >>> files in v6.0 und v7.0 : >>> >>> #1670031 <https://bugzilla.redhat.com/1670031>: >>> performance regression seen with smallfile workload tests >>> >>> #1659327 <https://bugzilla.redhat.com/1659327>: 43% >>> regression in small-file sequential read performance >>> >>> And one patch for the io-cache: >>> >>> #1659869 <https://bugzilla.redhat.com/1659869>: >>> improvements to io-cache >>> >>> Regards >>> >>> David Spisla >>> >>> >>> ________ >>> >>> Community Meeting Calendar: >>> >>> APAC Schedule - >>> Every 2nd and 4th Tuesday at 11:30 AM IST >>> Bridge:https://bluejeans.com/118564314 >>> >>> NA/EMEA Schedule - >>> Every 1st and 3rd Tuesday at 01:00 PM EDT >>> Bridge:https://bluejeans.com/118564314 >>> >>> Gluster-users mailing list >>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20191106/bc87238f/attachment.html>