Marc Jakobs
2020-Nov-09 20:59 UTC
[Gluster-users] Very poor GlusterFS Volume performance (glusterfs 8.2)
Hello, I have a GlusterFS Volume on three Linux Servers (Ubuntu 20.04LTS) which are connected via 1GBit/sec NIC with each other over a dedicated switch. Every server has a NVMe disk which is used for the GlusterFS Volume called "data". When I run a dd performance test directly on the NVMe disk, I get the following result: dd if=/dev/zero of=/opt/data/testfile bs=1G count=1 oflag=direct 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.25716 s, 854 MB/s When I run the same command on the local mounted GlusterFS, the result looks like this: dd if=/dev/zero of=./testfile bs=1G count=1 oflag=direct 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 18.5137 s, 58.0 MB/s or with oflag=sync dd if=/dev/zero of=./testfile bs=1G count=1 oflag=sync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 45.1265 s, 23.8 MB/s I have mounted the Volume like this mount -t glusterfs -o direct-io-mode=disable 127.0.0.1:/data /mnt/test/ so it does not even go over the local NIC but instead over the loopback device. I have also read a couple of GlusterFS performance tuning web-sites and set the following options: gluster vol set data nfs.disable on gluster volume set data performance.cache-size 1GB gluster volume set data performance.write-behind-window-size 1GB gluster volume set data performance.io-thread-count 32 gluster volume set data performance.io-cache on with no success - the write performance does not change - it stays exactly at 58.0 MB/sec, which is far far away from the max write speed of the NVMe disk. How can I improve the performance? Is this "normal"? From my Google searches it seems like pretty many people have this problem, but I can not find information how to dramatically improve the performance. What write speed do you guys get on your GlusterFS Volumes? Here are my current Volume "performance" settings. Thanks for any help! Marc --- performance.cache-max-file-size 0 performance.cache-min-file-size 0 performance.cache-refresh-timeout 1 performance.cache-priority performance.cache-size 1GB performance.io-thread-count 32 performance.high-prio-threads 16 performance.normal-prio-threads 16 performance.low-prio-threads 16 performance.least-prio-threads 1 performance.enable-least-priority on performance.iot-watchdog-secs (null) performance.iot-cleanup-disconnected-reqsoff performance.iot-pass-through false performance.io-cache-pass-through false performance.cache-size 1GB performance.qr-cache-timeout 1 performance.quick-read-cache-invalidationfalse performance.ctime-invalidation false performance.flush-behind on performance.nfs.flush-behind on performance.write-behind-window-size 1GB performance.resync-failed-syncs-after-fsyncoff performance.nfs.write-behind-window-size1MB performance.strict-o-direct off performance.nfs.strict-o-direct off performance.strict-write-ordering off performance.nfs.strict-write-ordering off performance.write-behind-trickling-writeson performance.aggregate-size 128KB performance.nfs.write-behind-trickling-writeson performance.lazy-open yes performance.read-after-open yes performance.open-behind-pass-through false performance.read-ahead-page-count 4 performance.read-ahead-pass-through false performance.readdir-ahead-pass-through false performance.md-cache-pass-through false performance.md-cache-timeout 1 performance.cache-swift-metadata false performance.cache-samba-metadata false performance.cache-capability-xattrs true performance.cache-ima-xattrs true performance.md-cache-statfs off performance.xattr-cache-list performance.nl-cache-pass-through false performance.write-behind on performance.read-ahead off performance.readdir-ahead off performance.io-cache on performance.open-behind on performance.quick-read on performance.nl-cache off performance.stat-prefetch on performance.client-io-threads off performance.nfs.write-behind on performance.nfs.read-ahead off performance.nfs.io-cache off performance.nfs.quick-read off performance.nfs.stat-prefetch off performance.nfs.io-threads off performance.force-readdirp true performance.cache-invalidation false performance.global-cache-invalidation true performance.parallel-readdir off performance.rda-request-size 131072 performance.rda-low-wmark 4096 performance.rda-high-wmark 128KB performance.rda-cache-limit 10MB performance.nl-cache-positive-entry false performance.nl-cache-limit 10MB performance.nl-cache-timeout 60
WK
2020-Nov-10 00:37 UTC
[Gluster-users] Very poor GlusterFS Volume performance (glusterfs 8.2)
On 11/9/2020 12:59 PM, Marc Jakobs wrote:> I have a GlusterFS Volume on three Linux Servers (Ubuntu 20.04LTS) which > are connected via 1GBit/sec NIC with each other over a dedicated switch. > > Every server has a NVMe disk which is used for the GlusterFS Volume > called "data".So I assume you have a simple replica 3 setup. Are you using sharding?> I have mounted the Volume like this > > mount -t glusterfs -o direct-io-mode=disable 127.0.0.1:/data /mnt/test/ > > so it does not even go over the local NIC but instead over the loopback > device.You are Network constrained. Your mount is direct, but if you have replica 3 the data still has to travel to the other two gluster bricks and that is occurring over a single 1 Gbit/s ethernet port which would have a maximum throughput of 125 MB/s. Since you have two streams going out that is roughly 62+ MB/s assuming full replica 3. My understanding is that gluster doesn't acknowledge a write until its been written to at least one of the replicas ( I am sure others will jump in and correct me).? So 60 MB/s under those circumstances is what I would expect to see. You can improve things by using an arbiter and supposedly the new Thin Arbiter is even faster (though I haven't tried it), but you lose a little safety The arbiter node only receives the metadata so it can referee on split-brain decisions, freeing up more BW for the actually data replica node. A huge improvement would be if you were to bond two or more Gbit/s ports. Round-Robin teamd is really easy to setup, or use the traditional bonding in its various flavors. You probably have some spare NIC cards lying around so its usually a 'freebie' Of course best case? would be to make the jump to 10Gb/s kit. -wk