Raghavendra Gowdappa
2018-Mar-20 03:56 UTC
[Gluster-users] Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)
On Tue, Mar 20, 2018 at 8:57 AM, Sam McLeod <mailinglists at smcleod.net> wrote:> Hi Raghavendra, > > > On 20 Mar 2018, at 1:55 pm, Raghavendra Gowdappa <rgowdapp at redhat.com> > wrote: > > Aggregating large number of small writes by write-behind into large writes > has been merged on master: > https://github.com/gluster/glusterfs/issues/364 > > Would like to know whether it helps for this usecase. Note that its not > part of any release yet. So you've to build and install from repo. > > > Sounds interesting, not too keen to build packages at the moment but I've > added myself as a watcher to that issue on Github and once it's in a 3.x > release I'll try it and let you know. > > Another suggestion is to run tests with turning off option > performance.write-behind-trickling-writes. > > # gluster volume set <volname> performance.write-behind-trickling-writes > off > > A word of caution though is if your files are too small, these suggestions > may not have much impact. > > > I'm looking for documentation on this option but all I could really find > is in the source for write-behind.c: > > if is enabled (which it is), do not hold back writes if there are no > outstanding requests. >Till recently this functionality though was available, couldn't be configured from cli. One could change this option by editing volume configuration file. However, now its configurable through cli: https://review.gluster.org/#/c/18719/> > and a note on aggregate-size stating that > > *"aggregation won't happen if performance.write-behind-trickling-writes is > turned on"* > > > What are the potentially negative performance impacts of disabling this? >Even if aggregation option is turned off, write-behind has the capacity to aggregate till a size of 128KB. But, to completely make use of this in case of small write workloads write-behind has to wait for sometime so that there are enough number of write-requests to fill the capacity. With this option enabled, write-behind though aggregates existing requests, won't wait for future writes. This means descendant xlators of write-behind can see writes smaller than 128K. So, for a scenario where small number of large writes are preferred over large number of small sized writes, this can be a problem.> -- > Sam McLeod (protoporpoise on IRC) > https://smcleod.net > https://twitter.com/s_mcleod > > Words are my own opinions and do not necessarily represent those of > my employer or partners. > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180320/1ccd96ef/attachment.html>
Sam McLeod
2018-Mar-20 04:15 UTC
[Gluster-users] Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)
Excellent description, thank you. With performance.write-behind-trickling-writes ON (default): ## 4k randwrite # fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=32 --size=256MB --readwrite=randwrite test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32 fio-3.1 Starting 1 process Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=17.3MiB/s][r=0,w=4422 IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=42701: Tue Mar 20 15:05:23 2018 write: IOPS=4443, BW=17.4MiB/s (18.2MB/s)(256MiB/14748msec) bw ( KiB/s): min=16384, max=19184, per=99.92%, avg=17760.45, stdev=602.48, samples=29 iops : min= 4096, max= 4796, avg=4440.07, stdev=150.66, samples=29 cpu : usr=4.00%, sys=18.02%, ctx=131097, majf=0, minf=7 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% issued rwt: total=0,65536,0, short=0,0,0, dropped=0,0,0 latency : target=0, window=0, percentile=100.00%, depth=32 Run status group 0 (all jobs): WRITE: bw=17.4MiB/s (18.2MB/s), 17.4MiB/s-17.4MiB/s (18.2MB/s-18.2MB/s), io=256MiB (268MB), run=14748-14748msec ## 2k randwrite # fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=2k --iodepth=32 --size=256MB --readwrite=randwrite test: (g=0): rw=randwrite, bs=(R) 2048B-2048B, (W) 2048B-2048B, (T) 2048B-2048B, ioengine=libaio, iodepth=32 fio-3.1 Starting 1 process Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=8624KiB/s][r=0,w=4312 IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=42781: Tue Mar 20 15:05:57 2018 write: IOPS=4439, BW=8880KiB/s (9093kB/s)(256MiB/29522msec) bw ( KiB/s): min= 6908, max= 9564, per=99.94%, avg=8874.03, stdev=428.92, samples=59 iops : min= 3454, max= 4782, avg=4437.00, stdev=214.44, samples=59 cpu : usr=2.43%, sys=18.18%, ctx=262222, majf=0, minf=8 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% issued rwt: total=0,131072,0, short=0,0,0, dropped=0,0,0 latency : target=0, window=0, percentile=100.00%, depth=32 Run status group 0 (all jobs): WRITE: bw=8880KiB/s (9093kB/s), 8880KiB/s-8880KiB/s (9093kB/s-9093kB/s), io=256MiB (268MB), run=29522-29522msec With performance.write-behind-trickling-writes OFF: ## 4k randwrite - just over half the IOP/s of having it ON. # fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=32 --size=256MB --readwrite=randwrite test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32 fio-3.1 Starting 1 process Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=44225: Tue Mar 20 15:11:04 2018 write: IOPS=2594, BW=10.1MiB/s (10.6MB/s)(256MiB/25259msec) bw ( KiB/s): min= 2248, max=18728, per=100.00%, avg=10454.10, stdev=6481.14, samples=50 iops : min= 562, max= 4682, avg=2613.50, stdev=1620.35, samples=50 cpu : usr=2.29%, sys=10.09%, ctx=131141, majf=0, minf=7 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% issued rwt: total=0,65536,0, short=0,0,0, dropped=0,0,0 latency : target=0, window=0, percentile=100.00%, depth=32 Run status group 0 (all jobs): WRITE: bw=10.1MiB/s (10.6MB/s), 10.1MiB/s-10.1MiB/s (10.6MB/s-10.6MB/s), io=256MiB (268MB), run=25259-25259msec ## 2k randwrite - no noticable change. # fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=2k --iodepth=32 --size=256MB --readwrite=randwrite test: (g=0): rw=randwrite, bs=(R) 2048B-2048B, (W) 2048B-2048B, (T) 2048B-2048B, ioengine=libaio, iodepth=32 fio-3.1 Starting 1 process Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=8662KiB/s][r=0,w=4331 IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=45813: Tue Mar 20 15:12:02 2018 write: IOPS=4291, BW=8583KiB/s (8789kB/s)(256MiB/30541msec) bw ( KiB/s): min= 7416, max=10264, per=99.94%, avg=8577.66, stdev=618.31, samples=61 iops : min= 3708, max= 5132, avg=4288.84, stdev=309.15, samples=61 cpu : usr=2.87%, sys=15.83%, ctx=262236, majf=0, minf=8 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% issued rwt: total=0,131072,0, short=0,0,0, dropped=0,0,0 latency : target=0, window=0, percentile=100.00%, depth=32 Run status group 0 (all jobs): WRITE: bw=8583KiB/s (8789kB/s), 8583KiB/s-8583KiB/s (8789kB/s-8789kB/s), io=256MiB (268MB), run=30541-30541msec Let me know if you'd recommend any other benchmarks comparing performance.write-behind-trickling-writes ON/OFF (just nothing that'll seriously risk locking up the whole gluster cluster please!). -- Sam McLeod Please respond via email when possible. https://smcleod.net https://twitter.com/s_mcleod> On 20 Mar 2018, at 2:56 pm, Raghavendra Gowdappa <rgowdapp at redhat.com> wrote: > > > > On Tue, Mar 20, 2018 at 8:57 AM, Sam McLeod <mailinglists at smcleod.net <mailto:mailinglists at smcleod.net>> wrote: > Hi Raghavendra, > > >> On 20 Mar 2018, at 1:55 pm, Raghavendra Gowdappa <rgowdapp at redhat.com <mailto:rgowdapp at redhat.com>> wrote: >> >> Aggregating large number of small writes by write-behind into large writes has been merged on master: >> https://github.com/gluster/glusterfs/issues/364 <https://github.com/gluster/glusterfs/issues/364> >> >> Would like to know whether it helps for this usecase. Note that its not part of any release yet. So you've to build and install from repo. > > Sounds interesting, not too keen to build packages at the moment but I've added myself as a watcher to that issue on Github and once it's in a 3.x release I'll try it and let you know. > >> Another suggestion is to run tests with turning off option performance.write-behind-trickling-writes. >> >> # gluster volume set <volname> performance.write-behind-trickling-writes off >> >> A word of caution though is if your files are too small, these suggestions may not have much impact. > > > I'm looking for documentation on this option but all I could really find is in the source for write-behind.c: > > if is enabled (which it is), do not hold back writes if there are no outstanding requests. > > Till recently this functionality though was available, couldn't be configured from cli. One could change this option by editing volume configuration file. However, now its configurable through cli: > > https://review.gluster.org/#/c/18719/ <https://review.gluster.org/#/c/18719/> > > > > and a note on aggregate-size stating that > > "aggregation won't happen if performance.write-behind-trickling-writes is turned on" > > > What are the potentially negative performance impacts of disabling this? > > Even if aggregation option is turned off, write-behind has the capacity to aggregate till a size of 128KB. But, to completely make use of this in case of small write workloads write-behind has to wait for sometime so that there are enough number of write-requests to fill the capacity. With this option enabled, write-behind though aggregates existing requests, won't wait for future writes. This means descendant xlators of write-behind can see writes smaller than 128K. So, for a scenario where small number of large writes are preferred over large number of small sized writes, this can be a problem. > > > -- > Sam McLeod (protoporpoise on IRC) > https://smcleod.net <https://smcleod.net/> > https://twitter.com/s_mcleod <https://twitter.com/s_mcleod> > > Words are my own opinions and do not necessarily represent those of my employer or partners. > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180320/603edb3b/attachment.html>
Raghavendra Gowdappa
2018-Mar-20 07:23 UTC
[Gluster-users] Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)
On Tue, Mar 20, 2018 at 9:45 AM, Sam McLeod <mailinglists at smcleod.net> wrote:> Excellent description, thank you. > > With performance.write-behind-trickling-writes ON (default): > > ## 4k randwrite >> # fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test > --filename=test --bs=4k --iodepth=32 --size=256MB --readwrite=randwrite > test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) > 4096B-4096B, ioengine=libaio, iodepth=32 > fio-3.1 > Starting 1 process > Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=17.3MiB/s][r=0,w=4422 IOPS][eta > 00m:00s] > test: (groupid=0, jobs=1): err= 0: pid=42701: Tue Mar 20 15:05:23 2018 > write: *IOPS=4443*, *BW=17.4MiB/s* (18.2MB/s)(256MiB/14748msec) > bw ( KiB/s): min=16384, max=19184, per=99.92%, avg=17760.45, > stdev=602.48, samples=29 > iops : min= 4096, max= 4796, avg=4440.07, stdev=150.66, > samples=29 > cpu : usr=4.00%, sys=18.02%, ctx=131097, majf=0, minf=7 > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, > >=64=0.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, > >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, > >=64=0.0% > issued rwt: total=0,65536,0, short=0,0,0, dropped=0,0,0 > latency : target=0, window=0, percentile=100.00%, depth=32 > > Run status group 0 (all jobs): > WRITE: bw=17.4MiB/s (18.2MB/s), 17.4MiB/s-17.4MiB/s (18.2MB/s-18.2MB/s), > io=256MiB (268MB), run=14748-14748msec > > > ## 2k randwrite > > # fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test > --filename=test --bs=2k --iodepth=32 --size=256MB --readwrite=randwrite > test: (g=0): rw=randwrite, bs=(R) 2048B-2048B, (W) 2048B-2048B, (T) > 2048B-2048B, ioengine=libaio, iodepth=32 > fio-3.1 > Starting 1 process > Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=8624KiB/s][r=0,w=4312 IOPS][eta > 00m:00s] > test: (groupid=0, jobs=1): err= 0: pid=42781: Tue Mar 20 15:05:57 2018 > write: *IOPS=4439, BW=8880KiB/s* (9093kB/s)(256MiB/29522msec) > bw ( KiB/s): min= 6908, max= 9564, per=99.94%, avg=8874.03, > stdev=428.92, samples=59 > iops : min= 3454, max= 4782, avg=4437.00, stdev=214.44, > samples=59 > cpu : usr=2.43%, sys=18.18%, ctx=262222, majf=0, minf=8 > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, > >=64=0.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, > >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, > >=64=0.0% > issued rwt: total=0,131072,0, short=0,0,0, dropped=0,0,0 > latency : target=0, window=0, percentile=100.00%, depth=32 > > Run status group 0 (all jobs): > WRITE: bw=8880KiB/s (9093kB/s), 8880KiB/s-8880KiB/s (9093kB/s-9093kB/s), > io=256MiB (268MB), run=29522-29522msec > > > With performance.write-behind-trickling-writes OFF: > > ## 4k randwrite - just over half the IOP/s of having it ON. >Note that since the workload is random write, no aggregation is possible. So, there is no point in waiting for future writes and turning trickling-writes on makes sense. A better test to measure the impact of this option would be sequential write workload. I guess smaller the writes, more pronounced one would see the benefits of this option turned off.> > # fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test > --filename=test --bs=4k --iodepth=32 --size=256MB --readwrite=randwrite > test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) > 4096B-4096B, ioengine=libaio, iodepth=32 > fio-3.1 > Starting 1 process > Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta > 00m:00s] > test: (groupid=0, jobs=1): err= 0: pid=44225: Tue Mar 20 15:11:04 2018 > write: *IOPS=2594, BW=10.1MiB/s* (10.6MB/s)(256MiB/25259msec) > bw ( KiB/s): min= 2248, max=18728, per=100.00%, avg=10454.10, > stdev=6481.14, samples=50 > iops : min= 562, max= 4682, avg=2613.50, stdev=1620.35, > samples=50 > cpu : usr=2.29%, sys=10.09%, ctx=131141, majf=0, minf=7 > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, > >=64=0.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, > >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, > >=64=0.0% > issued rwt: total=0,65536,0, short=0,0,0, dropped=0,0,0 > latency : target=0, window=0, percentile=100.00%, depth=32 > > Run status group 0 (all jobs): > WRITE: bw=10.1MiB/s (10.6MB/s), 10.1MiB/s-10.1MiB/s (10.6MB/s-10.6MB/s), > io=256MiB (268MB), run=25259-25259msec > > > ## 2k randwrite - no noticable change. > > # fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test > --filename=test --bs=2k --iodepth=32 --size=256MB --readwrite=randwrite > test: (g=0): rw=randwrite, bs=(R) 2048B-2048B, (W) 2048B-2048B, (T) > 2048B-2048B, ioengine=libaio, iodepth=32 > fio-3.1 > Starting 1 process > Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=8662KiB/s][r=0,w=4331 IOPS][eta > 00m:00s] > test: (groupid=0, jobs=1): err= 0: pid=45813: Tue Mar 20 15:12:02 2018 > write: *IOPS=4291, BW=8583KiB/s* (8789kB/s)(256MiB/30541msec) > bw ( KiB/s): min= 7416, max=10264, per=99.94%, avg=8577.66, > stdev=618.31, samples=61 > iops : min= 3708, max= 5132, avg=4288.84, stdev=309.15, > samples=61 > cpu : usr=2.87%, sys=15.83%, ctx=262236, majf=0, minf=8 > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, > >=64=0.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, > >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, > >=64=0.0% > issued rwt: total=0,131072,0, short=0,0,0, dropped=0,0,0 > latency : target=0, window=0, percentile=100.00%, depth=32 > > Run status group 0 (all jobs): > WRITE: bw=8583KiB/s (8789kB/s), 8583KiB/s-8583KiB/s (8789kB/s-8789kB/s), > io=256MiB (268MB), run=30541-30541msec > > > Let me know if you'd recommend any other benchmarks > comparing performance.write-behind-trickling-writes ON/OFF (just nothing > that'll seriously risk locking up the whole gluster cluster please!). > > > -- > Sam McLeod > Please respond via email when possible. > https://smcleod.net > https://twitter.com/s_mcleod > > On 20 Mar 2018, at 2:56 pm, Raghavendra Gowdappa <rgowdapp at redhat.com> > wrote: > > > > On Tue, Mar 20, 2018 at 8:57 AM, Sam McLeod <mailinglists at smcleod.net> > wrote: > >> Hi Raghavendra, >> >> >> On 20 Mar 2018, at 1:55 pm, Raghavendra Gowdappa <rgowdapp at redhat.com> >> wrote: >> >> Aggregating large number of small writes by write-behind into large >> writes has been merged on master: >> https://github.com/gluster/glusterfs/issues/364 >> >> Would like to know whether it helps for this usecase. Note that its not >> part of any release yet. So you've to build and install from repo. >> >> >> Sounds interesting, not too keen to build packages at the moment but I've >> added myself as a watcher to that issue on Github and once it's in a 3.x >> release I'll try it and let you know. >> >> Another suggestion is to run tests with turning off option >> performance.write-behind-trickling-writes. >> >> # gluster volume set <volname> performance.write-behind-trickling-writes >> off >> >> A word of caution though is if your files are too small, these >> suggestions may not have much impact. >> >> >> I'm looking for documentation on this option but all I could really find >> is in the source for write-behind.c: >> >> if is enabled (which it is), do not hold back writes if there are no >> outstanding requests. >> > > Till recently this functionality though was available, couldn't be > configured from cli. One could change this option by editing volume > configuration file. However, now its configurable through cli: > > https://review.gluster.org/#/c/18719/ > > >> >> and a note on aggregate-size stating that >> >> *"aggregation won't happen if performance.write-behind-trickling-writes >> is turned on"* >> >> >> What are the potentially negative performance impacts of disabling this? >> > > Even if aggregation option is turned off, write-behind has the capacity to > aggregate till a size of 128KB. But, to completely make use of this in case > of small write workloads write-behind has to wait for sometime so that > there are enough number of write-requests to fill the capacity. With this > option enabled, write-behind though aggregates existing requests, won't > wait for future writes. This means descendant xlators of write-behind can > see writes smaller than 128K. So, for a scenario where small number of > large writes are preferred over large number of small sized writes, this > can be a problem. > > >> -- >> Sam McLeod (protoporpoise on IRC) >> https://smcleod.net >> https://twitter.com/s_mcleod >> >> Words are my own opinions and do not necessarily represent those of >> my employer or partners. >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180320/53fc1da4/attachment.html>
Apparently Analagous Threads
- Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)
- Samba async performance - bottleneck or bug?
- Samba async performance - bottleneck or bug?
- [PATCH v3 00/13] virtio-fs: shared file system for virtual machines
- [PATCH v3 00/13] virtio-fs: shared file system for virtual machines