Strahil
2020-Jan-05 09:40 UTC
[Gluster-users] Performance tuning suggestions for nvme on aws
On Jan 5, 2020 03:05, Michael Richardson <hello at mikerichardson.com.au> wrote:> > Hi all! > > I'm experimenting with GFS for the first time have built a simple three-node cluster using AWS 'i3en' type instances. These instances provide raw nvme devices that are incredibly fast.? > > What I'm finding in these tests is that gluster is offering only a fraction of the raw nvme performance in a 3 replica set (ie, 3 nodes with 1 brick each). I'm wondering if there is anything I can do to squeeze more performance out.? > > For testing, I'm running fio using a 16GB test file with a 75/25 read/write split. Basically I'm trying to replicate a MySQL database which is what I'd ideally like to host here (which I realise is probably not practical).? > > My fio test command is:? > $ fio --name=fio-test2 --filename=fio-test \ > --randrepeat=1 \ > --ioengine=libaio \ > --direct=1 \ > --runtime=300 \ > --bs=16k \ > --iodepth=64 \ > --size=16G \ > --readwrite=randrw \ > --rwmixread=75 \ > --group_reporting \ > --numjobs=4 > > When I test this command directly on the nvme disk, I get:? > > READ: bw=313MiB/s (328MB/s), 313MiB/s-313MiB/s (328MB/s-328MB/s), io=47.0GiB (51.5GB), run=156806-156806msec > > WRITE: bw=105MiB/s (110MB/s), 105MiB/s-105MiB/s (110MB/s-110MB/s), io=16.0GiB (17.2GB), run=156806-156806msec > > When I install the disk into a gluster 3-replica volume, I get: > > READ: bw=86.3MiB/s (90.5MB/s), 86.3MiB/s-86.3MiB/s (90.5MB/s-90.5MB/s), io=25.3GiB (27.2GB), run=300002-300002msec > > WRITE: bw=28.9MiB/s (30.3MB/s), 28.9MiB/s-28.9MiB/s (30.3MB/s-30.3MB/s), io=8676MiB (9098MB), run=300002-300002msec > > If I do the same but with only 2 replicas, I get the same performance results. I also get the same rough values when doing 'read', 'randread', 'write', and 'randwrite' tests. > > I'm testing directly on one of the storage nodes, so there's no variables line client/server network performance in the mix. > > I ran the same test with EBS volumes and I saw similar performance drops when offering up the volume using gluster. A "Provisioned IOPS" EBS volume that could offer 10,000 IOPS directly, was getting only about 3500 IOPS when running as part of a gluster volume. > > We're using TLS on the management and volume connections, but I'm not seeing any CPU or memory constraint when using these volumes, so I don't believe that is the bottleneck. Similarly, when I try with SSL turned off, I see no change in performance. > > Does anyone have any suggestions on things I might try to increase performance when using these very fast disks as part of a gluster volume, or is this is to be expected when factoring in all the extra work that gluster needs to do when replicating data around volumes?1. Gluster & OS version ? 2. Check I/O scheduler of the NVMes -> should be none/noop 3. gluster volume set volname group db-workload Last login: Sun Jan 5 11:03:54 2020 from 192.168.1.11 [root at ovirt1 ~]# cat /var/lib/glusterd/groups/db-workload performance.open-behind=on performance.write-behind=off performance.stat-prefetch=off performance.quick-read=off performance.strict-o-direct=on performance.read-ahead=off performance.io-cache=off performance.readdir-ahead=off performance.client-io-threads=on server.event-threads=4 client.event-threads=4 performance.read-after-open=yes 4. Afterwards you can test different value for server/client event-threads (based on CPU cores).> Thanks very much for your time!! I'll put the two full fio outputs below if anyone wants more details. > > Mike > > > - First full fio test, nvme device without gluster > > fio-test: (groupid=0, jobs=4): err= 0: pid=5636: Sat Jan 4 23:09:18 2020 > > read: IOPS=20.0k, BW=313MiB/s (328MB/s)(47.0GiB/156806msec) > > slat (usec): min=3, max=6476, avg=88.44, stdev=326.96 > > clat (usec): min=218, max=89292, avg=11141.58, stdev=1871.14 > > lat (usec): min=226, max=89311, avg=11230.16, stdev=1883.88 > > clat percentiles (usec): > > | 1.00th=[ 3654], 5.00th=[ 8455], 10.00th=[ 9372], 20.00th=[10159], > > | 30.00th=[10552], 40.00th=[10814], 50.00th=[11076], 60.00th=[11338], > > | 70.00th=[11731], 80.00th=[12256], 90.00th=[13042], 95.00th=[13960], > > | 99.00th=[15795], 99.50th=[16581], 99.90th=[19268], 99.95th=[23200], > > | 99.99th=[36439] > > bw ( KiB/s): min=75904, max=257120, per=25.00%, avg=80178.59, stdev=9421.58, samples=1252 > > iops : min= 4744, max=16070, avg=5011.15, stdev=588.85, samples=1252 > > write: IOPS=6702, BW=105MiB/s (110MB/s)(16.0GiB/156806msec); 0 zone resets > > slat (usec): min=4, max=5587, avg=88.52, stdev=325.86 > > clat (usec): min=54, max=29847, avg=4491.18, stdev=1481.06 > > lat (usec): min=63, max=29859, avg=4579.83, stdev=1508.50 > > clat percentiles (usec): > > | 1.00th=[ 947], 5.00th=[ 1975], 10.00th=[ 2737], 20.00th=[ 3458], > > | 30.00th=[ 3916], 40.00th=[ 4178], 50.00th=[ 4424], 60.00th=[ 4686], > > | 70.00th=[ 5014], 80.00th=[ 5473], 90.00th=[ 6259], 95.00th=[ 6980], > > | 99.00th=[ 8717], 99.50th=[ 9503], 99.90th=[10945], 99.95th=[11600], > > | 99.99th=[13698] > > bw ( KiB/s): min=23296, max=86432, per=25.00%, avg=26812.24, stdev=3375.69, samples=1252 > > iops : min= 1456, max= 5402, avg=1675.75, stdev=210.98, samples=1252 > > lat (usec) : 100=0.01%, 250=0.01%, 500=0.06%, 750=0.11%, 1000=0.10% > > lat (msec) : 2=1.12%, 4=7.69%, 10=28.88%, 20=61.95%, 50=0.06% > > lat (msec) : 100=0.01% > > cpu : usr=1.56%, sys=7.85%, ctx=1905114, majf=0, minf=56 > > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% > > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% > > issued rwts: total=3143262,1051042,0,0 short=0,0,0,0 dropped=0,0,0,0 > > latency : target=0, window=0, percentile=100.00%, depth=64 > > > > Run status group 0 (all jobs): > > READ: bw=313MiB/s (328MB/s), 313MiB/s-313MiB/s (328MB/s-328MB/s), io=47.0GiB (51.5GB), run=156806-156806msec > > WRITE: bw=105MiB/s (110MB/s), 105MiB/s-105MiB/s (110MB/s-110MB/s), io=16.0GiB (17.2GB), run=156806-156806msec > > > > Disk stats (read/write): > > dm-4: ios=3455484/1154933, merge=0/0, ticks=35815316/4420412, in_queue=40257384, util=100.00%, aggrios=3456894/1155354, aggrmerge=0/0, aggrticks=35806896/4414972, aggrin_queue=40309192, aggrutil=99.99% > > dm-2: ios=3456894/1155354, merge=0/0, ticks=35806896/4414972, in_queue=40309192, util=99.99%, aggrios=1728447/577677, aggrmerge=0/0, aggrticks=17902352/2207092, aggrin_queue=20122108, aggrutil=100.00% > > dm-1: ios=3456894/1155354, merge=0/0, ticks=35804704/4414184, in_queue=40244216, util=100.00%, aggrios=3143273/1051086, aggrmerge=313621/104268, aggrticks=32277972/3937619, aggrin_queue=36289488, aggrutil=100.00% > > nvme0n1: ios=3143273/1051086, merge=313621/104268, ticks=32277972/3937619, in_queue=36289488, util=100.00% > > dm-0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% > > - Second full fio test, nvme device as part of a gluster volume > > fio-test2: (groupid=0, jobs=4): err= 0: pid=5537: Sat Jan 4 23:30:28 2020 > > read: IOPS=5525, BW=86.3MiB/s (90.5MB/s)(25.3GiB/300002msec) > > slat (nsec): min=1159, max=894687k, avg=9822.60, stdev=990825.87 > > clat (usec): min=963, max=3141.5k, avg=37455.28, stdev=123109.88 > > lat (usec): min=968, max=3141.5k, avg=37465.21, stdev=123121.94 > > clat percentiles (msec): > > | 1.00th=[ 7], 5.00th=[ 8], 10.00th=[ 8], 20.00th=[ 9], > > | 30.00th=[ 9], 40.00th=[ 9], 50.00th=[ 10], 60.00th=[ 10], > > | 70.00th=[ 11], 80.00th=[ 12], 90.00th=[ 48], 95.00th=[ 180], > > | 99.00th=[ 642], 99.50th=[ 860], 99.90th=[ 1435], 99.95th=[ 1687], > > | 99.99th=[ 2022] > > bw ( KiB/s): min= 31, max=93248, per=26.30%, avg=23247.24, stdev=20716.86, samples=2280 > > iops : min= 1, max= 5828, avg=1452.92, stdev=1294.81, samples=2280 > > write: IOPS=1850, BW=28.9MiB/s (30.3MB/s)(8676MiB/300002msec); 0 zone resets > > slat (usec): min=21, max=1586.3k, avg=2117.71, stdev=23082.86 > > clat (usec): min=20, max=2614.0k, avg=23888.03, stdev=99651.34 > > lat (usec): min=225, max=3141.2k, avg=26006.49, stdev=104758.57 > > clat percentiles (usec): > > | 1.00th=[ 889], 5.00th=[ 2343], 10.00th=[ 3654], > > | 20.00th=[ 5276], 30.00th=[ 5997], 40.00th=[ 6456], > > | 50.00th=[ 6849], 60.00th=[ 7177], 70.00th=[ 7504], > > | 80.00th=[ 7963], 90.00th=[ 8979], 95.00th=[ 74974], > > | 99.00th=[ 513803], 99.50th=[ 717226], 99.90th=[1333789], > > | 99.95th=[1518339], 99.99th=[1803551] > > bw ( KiB/s): min= 31, max=30240, per=27.05%, avg=8009.39, stdev=6912.26, samples=2217 > > iops : min= 1, max= 1890, avg=500.56, stdev=432.02, samples=2217 > > lat (usec) : 50=0.03%, 100=0.02%, 250=0.01%, 500=0.06%, 750=0.08% > > lat (usec) : 1000=0.11% > > lat (msec) : 2=0.66%, 4=1.97%, 10=71.07%, 20=14.47%, 50=2.69% > > lat (msec) : 100=2.23%, 250=3.21%, 500=1.94%, 750=0.82%, 1000=0.31% > > cpu : usr=0.59%, sys=1.19%, ctx=1172180, majf=0, minf=56 > > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% > > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% > > issued rwts: total=1657579,555275,0,0 short=0,0,0,0 dropped=0,0,0,0 > > latency : target=0, window=0, percentile=100.00%, depth=64 > > > > Run status group 0 (all jobs): > > READ: bw=86.3MiB/s (90.5MB/s), 86.3MiB/s-86.3MiB/s (90.5MB/s-90.5MB/s), io=25.3GiB (27.2GB), run=300002-300002msec > > WRITE: bw=28.9MiB/s (30.3MB/s), 28.9MiB/s-28.9MiB/s (30.3MB/s-30.3MB/s), io=8676MiB (9098MB), run=300002-300002msec >Best Regards, Strahil Nikolov -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200105/23933425/attachment.html>