Strahil
2019-Jul-03 15:55 UTC
[Gluster-users] Extremely low performance - am I doing somethingwrong?
Check the following link (4.1) for the optimal gluster volume settings. They are quite safe. Gluster provides a group called virt (/var/lib/glusterd/groups/virt) and can be applied via 'gluster volume set VOLNAME group virt' Then try again. Best Regards, Strahil NikolovOn Jul 3, 2019 11:39, Vladimir Melnik <v.melnik at tucha.ua> wrote:> > Dear colleagues, > > I have a lab with a bunch of virtual machines (the virtualization is > provided by KVM) running on the same physical host. 4 of these VMs are > working as a GlusterFS cluster and there's one more VM that works as a > client. I'll specify all the packages' versions in the ending of this > message. > > I created 2 volumes - one is having type "Distributed-Replicate" and > another one is "Distribute". The problem is that both of volumes are > showing really poor performance. > > Here's what I see on the client: > $ mount | grep gluster > 10.13.1.16:storage1 on /mnt/glusterfs1 type fuse.glusterfs(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) > 10.13.1.16:storage2 on /mnt/glusterfs2 type fuse.glusterfs(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) > > $ for i in {1..5}; do { dd if=/dev/zero of=/mnt/glusterfs1/test.tmp bs=1M count=10 oflag=sync; rm -f /mnt/glusterfs1/test.tmp; } done > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 1.47936 s, 7.1 MB/s > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 1.62546 s, 6.5 MB/s > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 1.71229 s, 6.1 MB/s > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 1.68607 s, 6.2 MB/s > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 1.82204 s, 5.8 MB/s > > $ for i in {1..5}; do { dd if=/dev/zero of=/mnt/glusterfs2/test.tmp bs=1M count=10 oflag=sync; rm -f /mnt/glusterfs2/test.tmp; } done > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 1.15739 s, 9.1 MB/s > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 0.978528 s, 10.7 MB/s > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 0.910642 s, 11.5 MB/s > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 0.998249 s, 10.5 MB/s > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 1.03377 s, 10.1 MB/s > > The distributed one shows a bit better performance than the > distributed-replicated one, but it's still poor. :-( > > The disk storage itself is OK, here's what I see on each of 4 GlusterFS > servers: > for i in {1..5}; do { dd if=/dev/zero of=/mnt/storage1/test.tmp bs=1M count=10 oflag=sync; rm -f /mnt/storage1/test.tmp; } done > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 0.0656698 s, 160 MB/s > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 0.0476927 s, 220 MB/s > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 0.036526 s, 287 MB/s > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 0.0329145 s, 319 MB/s > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 0.0403988 s, 260 MB/s > > The network between all 5 VMs is OK, they all are working on the same > physical host. > > Can't understand, what am I doing wrong. :-( > > Here's the detailed info about the volumes: > Volume Name: storage1 > Type: Distributed-Replicate > Volume ID: a42e2554-99e5-4331-bcc4-0900d002ae32 > Status: Started > Snapshot Count: 0 > Number of Bricks: 2 x (2 + 1) = 6 > Transport-type: tcp > Bricks: > Brick1: gluster1.k8s.maitre-d.tucha.ua:/mnt/storage1/brick1 > Brick2: gluster2.k8s.maitre-d.tucha.ua:/mnt/storage1/brick2 > Brick3: gluster3.k8s.maitre-d.tucha.ua:/mnt/storage1/brick_arbiter (arbiter) > Brick4: gluster3.k8s.maitre-d.tucha.ua:/mnt/storage1/brick3 > Brick5: gluster4.k8s.maitre-d.tucha.ua:/mnt/storage1/brick4 > Brick6: gluster1.k8s.maitre-d.tucha.ua:/mnt/storage1/brick_arbiter (arbiter) > Options Reconfigured: > transport.address-family: inet > nfs.disable: on > performance.client-io-threads: off > > Volume Name: storage2 > Type: Distribute > Volume ID: df4d8096-ad03-493e-9e0e-586ce21fb067 > Status: Started > Snapshot Count: 0 > Number of Bricks: 4 > Transport-type: tcp > Bricks: > Brick1: gluster1.k8s.maitre-d.tucha.ua:/mnt/storage2 > Brick2: gluster2.k8s.maitre-d.tucha.ua:/mnt/storage2 > Brick3: gluster3.k8s.maitre-d.tucha.ua:/mnt/storage2 > Brick4: gluster4.k8s.maitre-d.tucha.ua:/mnt/storage2 > Options Reconfigured: > transport.address-family: inet > nfs.disable: on > > The OS is CentOS Linux release 7.6.1810. The packages I'm using are: > glusterfs-6.3-1.el7.x86_64 > glusterfs-api-6.3-1.el7.x86_64 > glusterfs-cli-6.3-1.el7.x86_64 > glusterfs-client-xlators-6.3-1.el7.x86_64 > glusterfs-fuse-6.3-1.el7.x86_64 > glusterfs-libs-6.3-1.el7.x86_64 > glusterfs-server-6.3-1.el7.x86_64 > kernel-3.10.0-327.el7.x86_64 > kernel-3.10.0-514.2.2.el7.x86_64 > kernel-3.10.0-957.12.1.el7.x86_64 > kernel-3.10.0-957.12.2.el7.x86_64 > kernel-3.10.0-957.21.3.el7.x86_64 > kernel-tools-3.10.0-957.21.3.el7.x86_64 > kernel-tools-libs-3.10.0-957.21.3.el7.x86_6 > > Please, be so kind as to help me to understand, did I do it wrong or > that's quite normal performance of GlusterFS? > > Thanks in advance! > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users
Vladimir Melnik
2019-Jul-03 16:18 UTC
[Gluster-users] Extremely low performance - am I doing somethingwrong?
Thank you, it helped a little: $ for i in {1..5}; do { dd if=/dev/zero of=/mnt/glusterfs1/test.tmp bs=1M count=10 oflag=sync; rm -f /mnt/glusterfs1/test.tmp; } done 2>&1 | grep copied 10485760 bytes (10 MB) copied, 0.738968 s, 14.2 MB/s 10485760 bytes (10 MB) copied, 0.725296 s, 14.5 MB/s 10485760 bytes (10 MB) copied, 0.681508 s, 15.4 MB/s 10485760 bytes (10 MB) copied, 0.85566 s, 12.3 MB/s 10485760 bytes (10 MB) copied, 0.661457 s, 15.9 MB/s But 14-15 MB/s is still quite far from the actual storage's performance (200-3000 MB/s). :-( Here's full configuration dump (just in case): Option Value ------ ----- cluster.lookup-unhashed on cluster.lookup-optimize on cluster.min-free-disk 10% cluster.min-free-inodes 5% cluster.rebalance-stats off cluster.subvols-per-directory (null) cluster.readdir-optimize off cluster.rsync-hash-regex (null) cluster.extra-hash-regex (null) cluster.dht-xattr-name trusted.glusterfs.dht cluster.randomize-hash-range-by-gfid off cluster.rebal-throttle normal cluster.lock-migration off cluster.force-migration off cluster.local-volume-name (null) cluster.weighted-rebalance on cluster.switch-pattern (null) cluster.entry-change-log on cluster.read-subvolume (null) cluster.read-subvolume-index -1 cluster.read-hash-mode 1 cluster.background-self-heal-count 8 cluster.metadata-self-heal off cluster.data-self-heal off cluster.entry-self-heal off cluster.self-heal-daemon on cluster.heal-timeout 600 cluster.self-heal-window-size 1 cluster.data-change-log on cluster.metadata-change-log on cluster.data-self-heal-algorithm full cluster.eager-lock enable disperse.eager-lock on disperse.other-eager-lock on disperse.eager-lock-timeout 1 disperse.other-eager-lock-timeout 1 cluster.quorum-type auto cluster.quorum-count (null) cluster.choose-local off cluster.self-heal-readdir-size 1KB cluster.post-op-delay-secs 1 cluster.ensure-durability on cluster.consistent-metadata no cluster.heal-wait-queue-length 128 cluster.favorite-child-policy none cluster.full-lock yes diagnostics.latency-measurement off diagnostics.dump-fd-stats off diagnostics.count-fop-hits off diagnostics.brick-log-level INFO diagnostics.client-log-level INFO diagnostics.brick-sys-log-level CRITICAL diagnostics.client-sys-log-level CRITICAL diagnostics.brick-logger (null) diagnostics.client-logger (null) diagnostics.brick-log-format (null) diagnostics.client-log-format (null) diagnostics.brick-log-buf-size 5 diagnostics.client-log-buf-size 5 diagnostics.brick-log-flush-timeout 120 diagnostics.client-log-flush-timeout 120 diagnostics.stats-dump-interval 0 diagnostics.fop-sample-interval 0 diagnostics.stats-dump-format json diagnostics.fop-sample-buf-size 65535 diagnostics.stats-dnscache-ttl-sec 86400 performance.cache-max-file-size 0 performance.cache-min-file-size 0 performance.cache-refresh-timeout 1 performance.cache-priority performance.cache-size 32MB performance.io-thread-count 16 performance.high-prio-threads 16 performance.normal-prio-threads 16 performance.low-prio-threads 32 performance.least-prio-threads 1 performance.enable-least-priority on performance.iot-watchdog-secs (null) performance.iot-cleanup-disconnected-reqsoff performance.iot-pass-through false performance.io-cache-pass-through false performance.cache-size 128MB performance.qr-cache-timeout 1 performance.cache-invalidation false performance.ctime-invalidation false performance.flush-behind on performance.nfs.flush-behind on performance.write-behind-window-size 1MB performance.resync-failed-syncs-after-fsyncoff performance.nfs.write-behind-window-size1MB performance.strict-o-direct off performance.nfs.strict-o-direct off performance.strict-write-ordering off performance.nfs.strict-write-ordering off performance.write-behind-trickling-writeson performance.aggregate-size 128KB performance.nfs.write-behind-trickling-writeson performance.lazy-open yes performance.read-after-open yes performance.open-behind-pass-through false performance.read-ahead-page-count 4 performance.read-ahead-pass-through false performance.readdir-ahead-pass-through false performance.md-cache-pass-through false performance.md-cache-timeout 1 performance.cache-swift-metadata true performance.cache-samba-metadata false performance.cache-capability-xattrs true performance.cache-ima-xattrs true performance.md-cache-statfs off performance.xattr-cache-list performance.nl-cache-pass-through false features.encryption off network.frame-timeout 1800 network.ping-timeout 42 network.tcp-window-size (null) client.ssl off network.remote-dio enable client.event-threads 4 client.tcp-user-timeout 0 client.keepalive-time 20 client.keepalive-interval 2 client.keepalive-count 9 network.tcp-window-size (null) network.inode-lru-limit 16384 auth.allow * auth.reject (null) transport.keepalive 1 server.allow-insecure on server.root-squash off server.all-squash off server.anonuid 65534 server.anongid 65534 server.statedump-path /var/run/gluster server.outstanding-rpc-limit 64 server.ssl off auth.ssl-allow * server.manage-gids off server.dynamic-auth on client.send-gids on server.gid-timeout 300 server.own-thread (null) server.event-threads 4 server.tcp-user-timeout 42 server.keepalive-time 20 server.keepalive-interval 2 server.keepalive-count 9 transport.listen-backlog 1024 transport.address-family inet performance.write-behind on performance.read-ahead off performance.readdir-ahead on performance.io-cache off performance.open-behind on performance.quick-read off performance.nl-cache off performance.stat-prefetch on performance.client-io-threads on performance.nfs.write-behind on performance.nfs.read-ahead off performance.nfs.io-cache off performance.nfs.quick-read off performance.nfs.stat-prefetch off performance.nfs.io-threads off performance.force-readdirp true performance.cache-invalidation false performance.global-cache-invalidation true features.uss off features.snapshot-directory .snaps features.show-snapshot-directory off features.tag-namespaces off network.compression off network.compression.window-size -15 network.compression.mem-level 8 network.compression.min-size 0 network.compression.compression-level -1 network.compression.debug false features.default-soft-limit 80% features.soft-timeout 60 features.hard-timeout 5 features.alert-time 86400 features.quota-deem-statfs off geo-replication.indexing off geo-replication.indexing off geo-replication.ignore-pid-check off geo-replication.ignore-pid-check off features.quota off features.inode-quota off features.bitrot disable debug.trace off debug.log-history no debug.log-file no debug.exclude-ops (null) debug.include-ops (null) debug.error-gen off debug.error-failure (null) debug.error-number (null) debug.random-failure off debug.error-fops (null) nfs.disable on features.read-only off features.worm off features.worm-file-level off features.worm-files-deletable on features.default-retention-period 120 features.retention-mode relax features.auto-commit-period 180 storage.linux-aio off storage.batch-fsync-mode reverse-fsync storage.batch-fsync-delay-usec 0 storage.owner-uid -1 storage.owner-gid -1 storage.node-uuid-pathinfo off storage.health-check-interval 30 storage.build-pgfid off storage.gfid2path on storage.gfid2path-separator : storage.reserve 1 storage.health-check-timeout 10 storage.fips-mode-rchecksum off storage.force-create-mode 0000 storage.force-directory-mode 0000 storage.create-mask 0777 storage.create-directory-mask 0777 storage.max-hardlinks 100 features.ctime on config.gfproxyd off cluster.server-quorum-type server cluster.server-quorum-ratio 0 changelog.changelog off changelog.changelog-dir {{ brick.path }}/.glusterfs/changelogs changelog.encoding ascii changelog.rollover-time 15 changelog.fsync-interval 5 changelog.changelog-barrier-timeout 120 changelog.capture-del-path off features.barrier disable features.barrier-timeout 120 features.trash off features.trash-dir .trashcan features.trash-eliminate-path (null) features.trash-max-filesize 5MB features.trash-internal-op off cluster.enable-shared-storage disable locks.trace off locks.mandatory-locking off cluster.disperse-self-heal-daemon enable cluster.quorum-reads no client.bind-insecure (null) features.shard on features.shard-block-size 64MB features.shard-lru-limit 16384 features.shard-deletion-rate 100 features.scrub-throttle lazy features.scrub-freq biweekly features.scrub false features.expiry-time 120 features.cache-invalidation off features.cache-invalidation-timeout 60 features.leases off features.lease-lock-recall-timeout 60 disperse.background-heals 8 disperse.heal-wait-qlength 128 cluster.heal-timeout 600 dht.force-readdirp on disperse.read-policy gfid-hash cluster.shd-max-threads 8 cluster.shd-wait-qlength 10000 cluster.shd-wait-qlength 10000 cluster.locking-scheme granular cluster.granular-entry-heal no features.locks-revocation-secs 0 features.locks-revocation-clear-all false features.locks-revocation-max-blocked 0 features.locks-monkey-unlocking false features.locks-notify-contention no features.locks-notify-contention-delay 5 disperse.shd-max-threads 1 disperse.shd-wait-qlength 1024 disperse.cpu-extensions auto disperse.self-heal-window-size 1 cluster.use-compound-fops off performance.parallel-readdir off performance.rda-request-size 131072 performance.rda-low-wmark 4096 performance.rda-high-wmark 128KB performance.rda-cache-limit 10MB performance.nl-cache-positive-entry false performance.nl-cache-limit 10MB performance.nl-cache-timeout 60 cluster.brick-multiplex off cluster.max-bricks-per-process 250 disperse.optimistic-change-log on disperse.stripe-cache 4 cluster.halo-enabled False cluster.halo-shd-max-latency 99999 cluster.halo-nfsd-max-latency 5 cluster.halo-max-latency 5 cluster.halo-max-replicas 99999 cluster.halo-min-replicas 2 features.selinux on cluster.daemon-log-level INFO debug.delay-gen off delay-gen.delay-percentage 10% delay-gen.delay-duration 100000 delay-gen.enable disperse.parallel-writes on features.sdfs off features.cloudsync off features.ctime on ctime.noatime on feature.cloudsync-storetype (null) features.enforce-mandatory-lock off What do you think, are there any other knobs worth to be turned? Thanks! On Wed, Jul 03, 2019 at 06:55:09PM +0300, Strahil wrote:> Check the following link (4.1) for the optimal gluster volume settings. > They are quite safe. > > Gluster provides a group called virt (/var/lib/glusterd/groups/virt) and can be applied via 'gluster volume set VOLNAME group virt' > > Then try again. > > Best Regards, > Strahil NikolovOn Jul 3, 2019 11:39, Vladimir Melnik <v.melnik at tucha.ua> wrote: > > > > Dear colleagues, > > > > I have a lab with a bunch of virtual machines (the virtualization is > > provided by KVM) running on the same physical host. 4 of these VMs are > > working as a GlusterFS cluster and there's one more VM that works as a > > client. I'll specify all the packages' versions in the ending of this > > message. > > > > I created 2 volumes - one is having type "Distributed-Replicate" and > > another one is "Distribute". The problem is that both of volumes are > > showing really poor performance. > > > > Here's what I see on the client: > > $ mount | grep gluster > > 10.13.1.16:storage1 on /mnt/glusterfs1 type fuse.glusterfs(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) > > 10.13.1.16:storage2 on /mnt/glusterfs2 type fuse.glusterfs(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) > > > > $ for i in {1..5}; do { dd if=/dev/zero of=/mnt/glusterfs1/test.tmp bs=1M count=10 oflag=sync; rm -f /mnt/glusterfs1/test.tmp; } done > > 10+0 records in > > 10+0 records out > > 10485760 bytes (10 MB) copied, 1.47936 s, 7.1 MB/s > > 10+0 records in > > 10+0 records out > > 10485760 bytes (10 MB) copied, 1.62546 s, 6.5 MB/s > > 10+0 records in > > 10+0 records out > > 10485760 bytes (10 MB) copied, 1.71229 s, 6.1 MB/s > > 10+0 records in > > 10+0 records out > > 10485760 bytes (10 MB) copied, 1.68607 s, 6.2 MB/s > > 10+0 records in > > 10+0 records out > > 10485760 bytes (10 MB) copied, 1.82204 s, 5.8 MB/s > > > > $ for i in {1..5}; do { dd if=/dev/zero of=/mnt/glusterfs2/test.tmp bs=1M count=10 oflag=sync; rm -f /mnt/glusterfs2/test.tmp; } done > > 10+0 records in > > 10+0 records out > > 10485760 bytes (10 MB) copied, 1.15739 s, 9.1 MB/s > > 10+0 records in > > 10+0 records out > > 10485760 bytes (10 MB) copied, 0.978528 s, 10.7 MB/s > > 10+0 records in > > 10+0 records out > > 10485760 bytes (10 MB) copied, 0.910642 s, 11.5 MB/s > > 10+0 records in > > 10+0 records out > > 10485760 bytes (10 MB) copied, 0.998249 s, 10.5 MB/s > > 10+0 records in > > 10+0 records out > > 10485760 bytes (10 MB) copied, 1.03377 s, 10.1 MB/s > > > > The distributed one shows a bit better performance than the > > distributed-replicated one, but it's still poor. :-( > > > > The disk storage itself is OK, here's what I see on each of 4 GlusterFS > > servers: > > for i in {1..5}; do { dd if=/dev/zero of=/mnt/storage1/test.tmp bs=1M count=10 oflag=sync; rm -f /mnt/storage1/test.tmp; } done > > 10+0 records in > > 10+0 records out > > 10485760 bytes (10 MB) copied, 0.0656698 s, 160 MB/s > > 10+0 records in > > 10+0 records out > > 10485760 bytes (10 MB) copied, 0.0476927 s, 220 MB/s > > 10+0 records in > > 10+0 records out > > 10485760 bytes (10 MB) copied, 0.036526 s, 287 MB/s > > 10+0 records in > > 10+0 records out > > 10485760 bytes (10 MB) copied, 0.0329145 s, 319 MB/s > > 10+0 records in > > 10+0 records out > > 10485760 bytes (10 MB) copied, 0.0403988 s, 260 MB/s > > > > The network between all 5 VMs is OK, they all are working on the same > > physical host. > > > > Can't understand, what am I doing wrong. :-( > > > > Here's the detailed info about the volumes: > > Volume Name: storage1 > > Type: Distributed-Replicate > > Volume ID: a42e2554-99e5-4331-bcc4-0900d002ae32 > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 2 x (2 + 1) = 6 > > Transport-type: tcp > > Bricks: > > Brick1: gluster1.k8s.maitre-d.tucha.ua:/mnt/storage1/brick1 > > Brick2: gluster2.k8s.maitre-d.tucha.ua:/mnt/storage1/brick2 > > Brick3: gluster3.k8s.maitre-d.tucha.ua:/mnt/storage1/brick_arbiter (arbiter) > > Brick4: gluster3.k8s.maitre-d.tucha.ua:/mnt/storage1/brick3 > > Brick5: gluster4.k8s.maitre-d.tucha.ua:/mnt/storage1/brick4 > > Brick6: gluster1.k8s.maitre-d.tucha.ua:/mnt/storage1/brick_arbiter (arbiter) > > Options Reconfigured: > > transport.address-family: inet > > nfs.disable: on > > performance.client-io-threads: off > > > > Volume Name: storage2 > > Type: Distribute > > Volume ID: df4d8096-ad03-493e-9e0e-586ce21fb067 > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 4 > > Transport-type: tcp > > Bricks: > > Brick1: gluster1.k8s.maitre-d.tucha.ua:/mnt/storage2 > > Brick2: gluster2.k8s.maitre-d.tucha.ua:/mnt/storage2 > > Brick3: gluster3.k8s.maitre-d.tucha.ua:/mnt/storage2 > > Brick4: gluster4.k8s.maitre-d.tucha.ua:/mnt/storage2 > > Options Reconfigured: > > transport.address-family: inet > > nfs.disable: on > > > > The OS is CentOS Linux release 7.6.1810. The packages I'm using are: > > glusterfs-6.3-1.el7.x86_64 > > glusterfs-api-6.3-1.el7.x86_64 > > glusterfs-cli-6.3-1.el7.x86_64 > > glusterfs-client-xlators-6.3-1.el7.x86_64 > > glusterfs-fuse-6.3-1.el7.x86_64 > > glusterfs-libs-6.3-1.el7.x86_64 > > glusterfs-server-6.3-1.el7.x86_64 > > kernel-3.10.0-327.el7.x86_64 > > kernel-3.10.0-514.2.2.el7.x86_64 > > kernel-3.10.0-957.12.1.el7.x86_64 > > kernel-3.10.0-957.12.2.el7.x86_64 > > kernel-3.10.0-957.21.3.el7.x86_64 > > kernel-tools-3.10.0-957.21.3.el7.x86_64 > > kernel-tools-libs-3.10.0-957.21.3.el7.x86_6 > > > > Please, be so kind as to help me to understand, did I do it wrong or > > that's quite normal performance of GlusterFS? > > > > Thanks in advance! > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users-- V.Melnik