David Spisla
2019-Nov-04 15:16 UTC
[Gluster-users] Performance is falling rapidly when updating from v5.5 to v7.0
Dear Gluster Community, I also have a issue concerning performance. The last days I updated our test cluster from GlusterFS v5.5 to v7.0 . The setup in general: 2 HP DL380 Servers with 10Gbit NICs, 1 Distribute-Replica 2 Volume with 2 Replica Pairs. Client is SMB Samba (access via vfs_glusterfs) . I did several tests to ensure that Samba don't causes the fall. The setup ist completely the same except the Gluster Version Here are my results: 64KiB 1MiB 10MiB (Filesize) 3,49 47,41 300,50 (Values in MiB/s with GlusterFS v5.5) 0,16 2,61 76,63 (Values in MiB/s with GlusterFS v7.0) We use this volume options (GlusterFS 7.0): Volume Name: archive1 Type: Distributed-Replicate Volume ID: 44c17844-0bd4-4ca2-98d8-a1474add790c Status: Started Snapshot Count: 0 Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: fs-dl380-c1-n1:/gluster/brick1/glusterbrick Brick2: fs-dl380-c1-n2:/gluster/brick1/glusterbrick Brick3: fs-dl380-c1-n1:/gluster/brick2/glusterbrick Brick4: fs-dl380-c1-n2:/gluster/brick2/glusterbrick Options Reconfigured: performance.client-io-threads: off nfs.disable: on storage.fips-mode-rchecksum: on transport.address-family: inet user.smb: disable features.read-only: off features.worm: off features.worm-file-level: on features.retention-mode: enterprise features.default-retention-period: 120 network.ping-timeout: 10 features.cache-invalidation: on features.cache-invalidation-timeout: 600 performance.nl-cache: on performance.nl-cache-timeout: 600 client.event-threads: 32 server.event-threads: 32 cluster.lookup-optimize: on performance.stat-prefetch: on performance.cache-invalidation: on performance.md-cache-timeout: 600 performance.cache-samba-metadata: on performance.cache-ima-xattrs: on performance.io-thread-count: 64 cluster.use-compound-fops: on performance.cache-size: 512MB performance.cache-refresh-timeout: 10 performance.read-ahead: off performance.write-behind-window-size: 4MB performance.write-behind: on storage.build-pgfid: on features.ctime: on cluster.quorum-type: fixed cluster.quorum-count: 1 features.bitrot: on features.scrub: Active features.scrub-freq: daily For GlusterFS 5.5 its nearly the same except the fact that there were 2 options to enable ctime feature. Our optimization for Samba looks like this (for every version): [global] workgroup = SAMBA netbios name = CLUSTER kernel share modes = no aio read size = 1 aio write size = 1 kernel oplocks = no max open files = 100000 nt acl support = no security = user server min protocol = SMB2 store dos attributes = no strict locking = no full_audit:failure = pwrite_send pwrite_recv pwrite offload_write_send offload_write_recv create_file open unlink connect disconnect rename chown fchown lchown chmod fchmod mkdir rmdir ntimes ftruncate fallocate full_audit:success = pwrite_send pwrite_recv pwrite offload_write_send offload_write_recv create_file open unlink connect disconnect rename chown fchown lchown chmod fchmod mkdir rmdir ntimes ftruncate fallocate full_audit:facility = local5 durable handles = yes posix locking = no log level = 2 max log size = 100000 debug pid = yes What can be the cause for this rapid falling of the performance for small files? Are some of our vol options not recommended anymore? There were some patches concerning performance for small files in v6.0 und v7.0 : #1670031 <https://bugzilla.redhat.com/1670031>: performance regression seen with smallfile workload tests #1659327 <https://bugzilla.redhat.com/1659327>: 43% regression in small-file sequential read performance And one patch for the io-cache: #1659869 <https://bugzilla.redhat.com/1659869>: improvements to io-cache Regards David Spisla -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20191104/b7fcaa96/attachment.html>
RAFI KC
2019-Nov-05 11:06 UTC
[Gluster-users] Performance is falling rapidly when updating from v5.5 to v7.0
On 11/4/19 8:46 PM, David Spisla wrote:> Dear Gluster Community, > > I also have a issue concerning performance. The last days I updated > our test cluster from GlusterFS v5.5 to v7.0 . The setup in general: > > 2 HP DL380 Servers with 10Gbit NICs, 1 Distribute-Replica 2 Volume > with 2 Replica Pairs. Client is SMB Samba (access via vfs_glusterfs) . > I did several tests to ensure that Samba don't causes the fall. > The setup ist completely the same except the Gluster Version > Here are my results: > 64KiB???? ? ??? 1MiB ? ? ? ? ? ? 10MiB (Filesize) > 3,49 ? ? ? ? ?? 47,41? ? ? ? ? ? 300,50????????? (Values in MiB/s with > GlusterFS v5.5) > 0,16 ? ? ? ? ? ? 2,61? ?????????? 76,63??????????? (Values in MiB/s > with GlusterFS v7.0)Can you please share the profile information [1] for both versions?? Also it would be really helpful if you can mention the io patterns that used for this tests. [1] : https://docs.gluster.org/en/latest/Administrator%20Guide/Monitoring%20Workload/> > We use this volume options (GlusterFS 7.0): > > Volume Name: archive1 > Type: Distributed-Replicate > Volume ID: 44c17844-0bd4-4ca2-98d8-a1474add790c > Status: Started > Snapshot Count: 0 > Number of Bricks: 2 x 2 = 4 > Transport-type: tcp > Bricks: > Brick1: fs-dl380-c1-n1:/gluster/brick1/glusterbrick > Brick2: fs-dl380-c1-n2:/gluster/brick1/glusterbrick > Brick3: fs-dl380-c1-n1:/gluster/brick2/glusterbrick > Brick4: fs-dl380-c1-n2:/gluster/brick2/glusterbrick > Options Reconfigured: > performance.client-io-threads: off > nfs.disable: on > storage.fips-mode-rchecksum: on > transport.address-family: inet > user.smb: disable > features.read-only: off > features.worm: off > features.worm-file-level: on > features.retention-mode: enterprise > features.default-retention-period: 120 > network.ping-timeout: 10 > features.cache-invalidation: on > features.cache-invalidation-timeout: 600 > performance.nl-cache: on > performance.nl-cache-timeout: 600 > client.event-threads: 32 > server.event-threads: 32 > cluster.lookup-optimize: on > performance.stat-prefetch: on > performance.cache-invalidation: on > performance.md-cache-timeout: 600 > performance.cache-samba-metadata: on > performance.cache-ima-xattrs: on > performance.io-thread-count: 64 > cluster.use-compound-fops: on > performance.cache-size: 512MB > performance.cache-refresh-timeout: 10 > performance.read-ahead: off > performance.write-behind-window-size: 4MB > performance.write-behind: on > storage.build-pgfid: on > features.ctime: on > cluster.quorum-type: fixed > cluster.quorum-count: 1 > features.bitrot: on > features.scrub: Active > features.scrub-freq: daily > > For GlusterFS 5.5 its nearly the same except the fact that there were > 2 options to enable ctime feature.Ctime stores additional metadata information as an extended attributes which sometimes exceeds the default inode size. In such scenarios the additional xattrs won't fit into the default size. This will result in additional blocks to be used to store xattrs in the inide, which will effect the latency. This is purely based on the i/o operations and the total xattrs size stored in the inode. Is it possible for you to repeat the test by disabling ctime or increasing the inode size to a higher value say 1024KB?> Our optimization for Samba looks like this (for every version): > > [global] > workgroup = SAMBA > netbios name = CLUSTER > kernel share modes = no > aio read size = 1 > aio write size = 1 > kernel oplocks = no > max open files = 100000 > nt acl support = no > security = user > server min protocol = SMB2 > store dos attributes = no > strict locking = no > full_audit:failure = pwrite_send pwrite_recv pwrite offload_write_send > offload_write_recv create_file open unlink connect disconnect rename > chown fchown lchown chmod fchmod mkdir rmdir ntimes ftruncate fallocate > full_audit:success = pwrite_send pwrite_recv pwrite offload_write_send > offload_write_recv create_file open unlink connect disconnect rename > chown fchown lchown chmod fchmod mkdir rmdir ntimes ftruncate fallocate > full_audit:facility = local5 > durable handles = yes > posix locking = no > log level = 2 > max log size = 100000 > debug pid = yes > > What can be the cause for this rapid falling of the performance for > small files? Are some of our vol options not recommended anymore? > There were some patches concerning performance for small files in v6.0 > und v7.0 : > > #1670031 <https://bugzilla.redhat.com/1670031>: performance regression > seen with smallfile workload tests > > #1659327 <https://bugzilla.redhat.com/1659327>: 43% regression in > small-file sequential read performance > > And one patch for the io-cache: > > #1659869 <https://bugzilla.redhat.com/1659869>: improvements to io-cache > > Regards > > David Spisla > > > ________ > > Community Meeting Calendar: > > APAC Schedule - > Every 2nd and 4th Tuesday at 11:30 AM IST > Bridge: https://bluejeans.com/118564314 > > NA/EMEA Schedule - > Every 1st and 3rd Tuesday at 01:00 PM EDT > Bridge: https://bluejeans.com/118564314 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20191105/37b46f86/attachment.html>