Anastasia Belyaeva
2018-Apr-11 19:03 UTC
[Gluster-users] Unreasonably poor performance of replicated volumes
Hello everybody! I have 3 gluster servers (*gluster 3.12.6, Centos 7.2*; those are actually virtual machines located on 3 separate physical XenServer7.1 servers) They are all connected via infiniband network. Iperf3 shows around *23 Gbit/s network bandwidth *between each 2 of them. Each server has 3 HDD put into a *stripe*3 thin pool (LVM2) *with logical volume created on top of it, formatted with *xfs*. Gluster top reports the following throughput: root at fsnode2 ~ $ gluster volume top r3vol write-perf bs 4096 count 524288> list-cnt 0 > Brick: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick > Throughput *631.82 MBps *time 3.3989 secs > Brick: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick > Throughput *566.96 MBps *time 3.7877 secs > Brick: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick > Throughput *546.65 MBps *time 3.9285 secsroot at fsnode2 ~ $ gluster volume top r2vol write-perf bs 4096 count 524288> list-cnt 0 > Brick: fsnode2.ibnet:/data/glusterfs/r2vol/brick1/brick > Throughput *539.60 MBps *time 3.9798 secs > Brick: fsnode4.ibnet:/data/glusterfs/r2vol/brick1/brick > Throughput *580.07 MBps *time 3.7021 secsAnd two *pure replicated ('replica 2' and 'replica 3')* volumes. *The 'replica 2' volume is for testing purpose only.> Volume Name: r2vol > Type: Replicate > Volume ID: 4748d0c0-6bef-40d5-b1ec-d30e10cfddd9 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: fsnode2.ibnet:/data/glusterfs/r2vol/brick1/brick > Brick2: fsnode4.ibnet:/data/glusterfs/r2vol/brick1/brick > Options Reconfigured: > nfs.disable: on >> Volume Name: r3vol > Type: Replicate > Volume ID: b0f64c28-57e1-4b9d-946b-26ed6b499f29 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick > Brick2: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick > Brick3: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick > Options Reconfigured: > nfs.disable: on*Client *is also gluster 3.12.6, Centos 7.3 virtual machine, *FUSE mount*> root at centos7u3-nogdesktop2 ~ $ mount |grep gluster > gluster-host.ibnet:/r2vol on /mnt/gluster/r2 type fuse.glusterfs > (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) > gluster-host.ibnet:/r3vol on /mnt/gluster/r3 type fuse.glusterfs > (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)*The problem *is that there is a significant performance loss with smaller block sizes. For example: *4K block size* [replica 3 volume] root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero of=/mnt/gluster/r3/file$RANDOM bs=4096 count=262144 262144+0 records in 262144+0 records out 1073741824 bytes (1.1 GB) copied, 11.2207 s, *95.7 MB/s* [replica 2 volume] root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero of=/mnt/gluster/r2/file$RANDOM bs=4096 count=262144 262144+0 records in 262144+0 records out 1073741824 bytes (1.1 GB) copied, 12.0149 s, *89.4 MB/s* *512K block size* [replica 3 volume] root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero of=/mnt/gluster/r3/file$RANDOM bs=512K count=2048 2048+0 records in 2048+0 records out 1073741824 bytes (1.1 GB) copied, 5.27207 s, *204 MB/s* [replica 2 volume] root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero of=/mnt/gluster/r2/file$RANDOM bs=512K count=2048 2048+0 records in 2048+0 records out 1073741824 bytes (1.1 GB) copied, 4.22321 s, *254 MB/s* With bigger block size It's still not where I expect it to be, but at least it starts to make some sense. I've been trying to solve this for a very long time with no luck. I've already tried both kernel tuning (different 'tuned' profiles and the ones recommended in the "Linux Kernel Tuning" section) and tweaking gluster volume options, including write-behind/flush-behind/write-behind-window-size. The latter, to my surprise, didn't make any difference. 'Cause at first I thought it was the buffering issue but it turns out it does buffer writes, just not very efficient (well at least what it looks like in the *gluster profile output*) root at fsnode2 ~ $ gluster volume profile r3vol info clear> ... > Cleared stats.root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero> of=/mnt/gluster/r3/file$RANDOM bs=4096 count=262144 > 262144+0 records in > 262144+0 records out > 1073741824 bytes (1.1 GB) copied, 10.9743 s, 97.8 MB/s> root at fsnode2 ~ $ gluster volume profile r3vol info > Brick: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick > ------------------------------------------------------- > Cumulative Stats: > Block Size: 4096b+ 8192b+ > 16384b+ > No. of Reads: 0 0 > 0 > No. of Writes: 1576 4173 > 19605 > Block Size: 32768b+ 65536b+ > 131072b+ > No. of Reads: 0 0 > 0 > No. of Writes: 7777 1847 > 657 > %-latency Avg-latency Min-Latency Max-Latency No. of calls > Fop > --------- ----------- ----------- ----------- ------------ > ---- > 0.00 0.00 us 0.00 us 0.00 us 1 > RELEASE > 0.00 18.00 us 18.00 us 18.00 us 1 > STATFS > 0.00 20.50 us 11.00 us 30.00 us 2 > FLUSH > 0.00 22.50 us 17.00 us 28.00 us 2 > FINODELK > 0.01 76.50 us 65.00 us 88.00 us 2 > FXATTROP > 0.01 177.00 us 177.00 us 177.00 us 1 > CREATE > 0.02 56.14 us 23.00 us 128.00 us 7 > LOOKUP > 0.02 259.00 us 20.00 us 498.00 us 2 > ENTRYLK > 99.94 59.23 us 17.00 us 10914.00 us 35635 > WRITE > Duration: 38 seconds > Data Read: 0 bytes > Data Written: 1073741824 bytes > Interval 0 Stats: > Block Size: 4096b+ 8192b+ > 16384b+ > No. of Reads: 0 0 > 0 > No. of Writes: 1576 4173 > 19605 > Block Size: 32768b+ 65536b+ > 131072b+ > No. of Reads: 0 0 > 0 > No. of Writes: 7777 1847 > 657 > %-latency Avg-latency Min-Latency Max-Latency No. of calls > Fop > --------- ----------- ----------- ----------- ------------ > ---- > 0.00 0.00 us 0.00 us 0.00 us 1 > RELEASE > 0.00 18.00 us 18.00 us 18.00 us 1 > STATFS > 0.00 20.50 us 11.00 us 30.00 us 2 > FLUSH > 0.00 22.50 us 17.00 us 28.00 us 2 > FINODELK > 0.01 76.50 us 65.00 us 88.00 us 2 > FXATTROP > 0.01 177.00 us 177.00 us 177.00 us 1 > CREATE > 0.02 56.14 us 23.00 us 128.00 us 7 > LOOKUP > 0.02 259.00 us 20.00 us 498.00 us 2 > ENTRYLK > 99.94 59.23 us 17.00 us 10914.00 us 35635 > WRITE > Duration: 38 seconds > Data Read: 0 bytes > Data Written: 1073741824 bytes > Brick: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick > ------------------------------------------------------- > Cumulative Stats: > Block Size: 4096b+ 8192b+ > 16384b+ > No. of Reads: 0 0 > 0 > No. of Writes: 1576 4173 > 19605 > Block Size: 32768b+ 65536b+ > 131072b+ > No. of Reads: 0 0 > 0 > No. of Writes: 7777 1847 > 657 > %-latency Avg-latency Min-Latency Max-Latency No. of calls > Fop > --------- ----------- ----------- ----------- ------------ > ---- > 0.00 0.00 us 0.00 us 0.00 us 1 > RELEASE > 0.00 33.00 us 33.00 us 33.00 us 1 > STATFS > 0.00 22.50 us 13.00 us 32.00 us 2 > ENTRYLK > 0.00 32.00 us 26.00 us 38.00 us 2 > FLUSH > 0.01 47.50 us 16.00 us 79.00 us 2 > FINODELK > 0.01 157.00 us 157.00 us 157.00 us 1 > CREATE > 0.01 92.00 us 70.00 us 114.00 us 2 > FXATTROP > 0.03 72.57 us 39.00 us 121.00 us 7 > LOOKUP > 99.94 47.97 us 15.00 us 1598.00 us 35635 > WRITE > Duration: 38 seconds > Data Read: 0 bytes > Data Written: 1073741824 bytes > Interval 0 Stats: > Block Size: 4096b+ 8192b+ > 16384b+ > No. of Reads: 0 0 > 0 > No. of Writes: 1576 4173 > 19605 > Block Size: 32768b+ 65536b+ > 131072b+ > No. of Reads: 0 0 > 0 > No. of Writes: 7777 1847 > 657 > %-latency Avg-latency Min-Latency Max-Latency No. of calls > Fop > --------- ----------- ----------- ----------- ------------ > ---- > 0.00 0.00 us 0.00 us 0.00 us 1 > RELEASE > 0.00 33.00 us 33.00 us 33.00 us 1 > STATFS > 0.00 22.50 us 13.00 us 32.00 us 2 > ENTRYLK > 0.00 32.00 us 26.00 us 38.00 us 2 > FLUSH > 0.01 47.50 us 16.00 us 79.00 us 2 > FINODELK > 0.01 157.00 us 157.00 us 157.00 us 1 > CREATE > 0.01 92.00 us 70.00 us 114.00 us 2 > FXATTROP > 0.03 72.57 us 39.00 us 121.00 us 7 > LOOKUP > 99.94 47.97 us 15.00 us 1598.00 us 35635 > WRITE > Duration: 38 seconds > Data Read: 0 bytes > Data Written: 1073741824 bytes > Brick: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick > ------------------------------------------------------- > Cumulative Stats: > Block Size: 4096b+ 8192b+ > 16384b+ > No. of Reads: 0 0 > 0 > No. of Writes: 1576 4173 > 19605 > Block Size: 32768b+ 65536b+ > 131072b+ > No. of Reads: 0 0 > 0 > No. of Writes: 7777 1847 > 657 > %-latency Avg-latency Min-Latency Max-Latency No. of calls > Fop > --------- ----------- ----------- ----------- ------------ > ---- > 0.00 0.00 us 0.00 us 0.00 us 1 > RELEASE > 0.00 58.00 us 58.00 us 58.00 us 1 > STATFS > 0.00 38.00 us 38.00 us 38.00 us 2 > ENTRYLK > 0.01 59.00 us 32.00 us 86.00 us 2 > FLUSH > 0.01 81.00 us 33.00 us 129.00 us 2 > FINODELK > 0.01 91.50 us 73.00 us 110.00 us 2 > FXATTROP > 0.01 239.00 us 239.00 us 239.00 us 1 > CREATE > 0.04 103.14 us 63.00 us 210.00 us 7 > LOOKUP > 99.92 52.99 us 16.00 us 11289.00 us 35635 > WRITE > Duration: 38 seconds > Data Read: 0 bytes > Data Written: 1073741824 bytes > Interval 0 Stats: > Block Size: 4096b+ 8192b+ > 16384b+ > No. of Reads: 0 0 > 0 > No. of Writes: 1576 4173 > 19605 > Block Size: 32768b+ 65536b+ > 131072b+ > No. of Reads: 0 0 > 0 > No. of Writes: 7777 1847 > 657 > %-latency Avg-latency Min-Latency Max-Latency No. of calls > Fop > --------- ----------- ----------- ----------- ------------ > ---- > 0.00 0.00 us 0.00 us 0.00 us 1 > RELEASE > 0.00 58.00 us 58.00 us 58.00 us 1 > STATFS > 0.00 38.00 us 38.00 us 38.00 us 2 > ENTRYLK > 0.01 59.00 us 32.00 us 86.00 us 2 > FLUSH > 0.01 81.00 us 33.00 us 129.00 us 2 > FINODELK > 0.01 91.50 us 73.00 us 110.00 us 2 > FXATTROP > 0.01 239.00 us 239.00 us 239.00 us 1 > CREATE > 0.04 103.14 us 63.00 us 210.00 us 7 > LOOKUP > 99.92 52.99 us 16.00 us 11289.00 us 35635 > WRITE > Duration: 38 seconds > Data Read: 0 bytes > Data Written: 1073741824 bytesAt this point I'm officially run out of idea where to look next. So any help, suggestions or pointers are highly appreciated! -- Best regards, Anastasia Belyaeva -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180411/df462c8d/attachment.html>
Vlad Kopylov
2018-Apr-12 22:57 UTC
[Gluster-users] Unreasonably poor performance of replicated volumes
Guess you went through user lists and tried something like this already http://lists.gluster.org/pipermail/gluster-users/2018-April/033811.html I have a same exact setup and below is as far as it went after months of trail and error. We all have somewhat same setup and same issue with this - you can find same post as yours on the daily basis. On Wed, Apr 11, 2018 at 3:03 PM, Anastasia Belyaeva <anastasia.blv at gmail.com> wrote:> Hello everybody! > > I have 3 gluster servers (*gluster 3.12.6, Centos 7.2*; those are > actually virtual machines located on 3 separate physical XenServer7.1 > servers) > > They are all connected via infiniband network. Iperf3 shows around *23 > Gbit/s network bandwidth *between each 2 of them. > > Each server has 3 HDD put into a *stripe*3 thin pool (LVM2) *with logical > volume created on top of it, formatted with *xfs*. Gluster top reports > the following throughput: > > root at fsnode2 ~ $ gluster volume top r3vol write-perf bs 4096 count 524288 >> list-cnt 0 >> Brick: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick >> Throughput *631.82 MBps *time 3.3989 secs >> Brick: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick >> Throughput *566.96 MBps *time 3.7877 secs >> Brick: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick >> Throughput *546.65 MBps *time 3.9285 secs > > > root at fsnode2 ~ $ gluster volume top r2vol write-perf bs 4096 count 524288 >> list-cnt 0 >> Brick: fsnode2.ibnet:/data/glusterfs/r2vol/brick1/brick >> Throughput *539.60 MBps *time 3.9798 secs >> Brick: fsnode4.ibnet:/data/glusterfs/r2vol/brick1/brick >> Throughput *580.07 MBps *time 3.7021 secs > > > And two *pure replicated ('replica 2' and 'replica 3')* volumes. *The > 'replica 2' volume is for testing purpose only. > >> Volume Name: r2vol >> Type: Replicate >> Volume ID: 4748d0c0-6bef-40d5-b1ec-d30e10cfddd9 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x 2 = 2 >> Transport-type: tcp >> Bricks: >> Brick1: fsnode2.ibnet:/data/glusterfs/r2vol/brick1/brick >> Brick2: fsnode4.ibnet:/data/glusterfs/r2vol/brick1/brick >> Options Reconfigured: >> nfs.disable: on >> > > >> Volume Name: r3vol >> Type: Replicate >> Volume ID: b0f64c28-57e1-4b9d-946b-26ed6b499f29 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x 3 = 3 >> Transport-type: tcp >> Bricks: >> Brick1: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick >> Brick2: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick >> Brick3: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick >> Options Reconfigured: >> nfs.disable: on > > > > *Client *is also gluster 3.12.6, Centos 7.3 virtual machine, *FUSE mount* > >> root at centos7u3-nogdesktop2 ~ $ mount |grep gluster >> gluster-host.ibnet:/r2vol on /mnt/gluster/r2 type fuse.glusterfs >> (rw,relatime,user_id=0,group_id=0,default_permissions, >> allow_other,max_read=131072) >> gluster-host.ibnet:/r3vol on /mnt/gluster/r3 type fuse.glusterfs >> (rw,relatime,user_id=0,group_id=0,default_permissions, >> allow_other,max_read=131072) > > > > *The problem *is that there is a significant performance loss with > smaller block sizes. For example: > > *4K block size* > [replica 3 volume] > root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero > of=/mnt/gluster/r3/file$RANDOM bs=4096 count=262144 > 262144+0 records in > 262144+0 records out > 1073741824 bytes (1.1 GB) copied, 11.2207 s, *95.7 MB/s* > > [replica 2 volume] > root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero > of=/mnt/gluster/r2/file$RANDOM bs=4096 count=262144 > 262144+0 records in > 262144+0 records out > 1073741824 bytes (1.1 GB) copied, 12.0149 s, *89.4 MB/s* > > *512K block size* > [replica 3 volume] > root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero > of=/mnt/gluster/r3/file$RANDOM bs=512K count=2048 > 2048+0 records in > 2048+0 records out > 1073741824 bytes (1.1 GB) copied, 5.27207 s, *204 MB/s* > > [replica 2 volume] > root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero > of=/mnt/gluster/r2/file$RANDOM bs=512K count=2048 > 2048+0 records in > 2048+0 records out > 1073741824 bytes (1.1 GB) copied, 4.22321 s, *254 MB/s* > > With bigger block size It's still not where I expect it to be, but at > least it starts to make some sense. > > I've been trying to solve this for a very long time with no luck. > I've already tried both kernel tuning (different 'tuned' profiles and the > ones recommended in the "Linux Kernel Tuning" section) and tweaking gluster > volume options, including write-behind/flush-behind/ > write-behind-window-size. > The latter, to my surprise, didn't make any difference. 'Cause at first I > thought it was the buffering issue but it turns out it does buffer writes, > just not very efficient (well at least what it looks like in the *gluster > profile output*) > > root at fsnode2 ~ $ gluster volume profile r3vol info clear >> ... >> Cleared stats. > > > root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero >> of=/mnt/gluster/r3/file$RANDOM bs=4096 count=262144 >> 262144+0 records in >> 262144+0 records out >> 1073741824 bytes (1.1 GB) copied, 10.9743 s, 97.8 MB/s > > > >> root at fsnode2 ~ $ gluster volume profile r3vol info >> Brick: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick >> ------------------------------------------------------- >> Cumulative Stats: >> Block Size: 4096b+ 8192b+ >> 16384b+ >> No. of Reads: 0 0 >> 0 >> No. of Writes: 1576 4173 >> 19605 >> Block Size: 32768b+ 65536b+ >> 131072b+ >> No. of Reads: 0 0 >> 0 >> No. of Writes: 7777 1847 >> 657 >> %-latency Avg-latency Min-Latency Max-Latency No. of calls >> Fop >> --------- ----------- ----------- ----------- ------------ >> ---- >> 0.00 0.00 us 0.00 us 0.00 us 1 >> RELEASE >> 0.00 18.00 us 18.00 us 18.00 us 1 >> STATFS >> 0.00 20.50 us 11.00 us 30.00 us 2 >> FLUSH >> 0.00 22.50 us 17.00 us 28.00 us 2 >> FINODELK >> 0.01 76.50 us 65.00 us 88.00 us 2 >> FXATTROP >> 0.01 177.00 us 177.00 us 177.00 us 1 >> CREATE >> 0.02 56.14 us 23.00 us 128.00 us 7 >> LOOKUP >> 0.02 259.00 us 20.00 us 498.00 us 2 >> ENTRYLK >> 99.94 59.23 us 17.00 us 10914.00 us 35635 >> WRITE >> Duration: 38 seconds >> Data Read: 0 bytes >> Data Written: 1073741824 bytes >> Interval 0 Stats: >> Block Size: 4096b+ 8192b+ >> 16384b+ >> No. of Reads: 0 0 >> 0 >> No. of Writes: 1576 4173 >> 19605 >> Block Size: 32768b+ 65536b+ >> 131072b+ >> No. of Reads: 0 0 >> 0 >> No. of Writes: 7777 1847 >> 657 >> %-latency Avg-latency Min-Latency Max-Latency No. of calls >> Fop >> --------- ----------- ----------- ----------- ------------ >> ---- >> 0.00 0.00 us 0.00 us 0.00 us 1 >> RELEASE >> 0.00 18.00 us 18.00 us 18.00 us 1 >> STATFS >> 0.00 20.50 us 11.00 us 30.00 us 2 >> FLUSH >> 0.00 22.50 us 17.00 us 28.00 us 2 >> FINODELK >> 0.01 76.50 us 65.00 us 88.00 us 2 >> FXATTROP >> 0.01 177.00 us 177.00 us 177.00 us 1 >> CREATE >> 0.02 56.14 us 23.00 us 128.00 us 7 >> LOOKUP >> 0.02 259.00 us 20.00 us 498.00 us 2 >> ENTRYLK >> 99.94 59.23 us 17.00 us 10914.00 us 35635 >> WRITE >> Duration: 38 seconds >> Data Read: 0 bytes >> Data Written: 1073741824 bytes >> Brick: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick >> ------------------------------------------------------- >> Cumulative Stats: >> Block Size: 4096b+ 8192b+ >> 16384b+ >> No. of Reads: 0 0 >> 0 >> No. of Writes: 1576 4173 >> 19605 >> Block Size: 32768b+ 65536b+ >> 131072b+ >> No. of Reads: 0 0 >> 0 >> No. of Writes: 7777 1847 >> 657 >> %-latency Avg-latency Min-Latency Max-Latency No. of calls >> Fop >> --------- ----------- ----------- ----------- ------------ >> ---- >> 0.00 0.00 us 0.00 us 0.00 us 1 >> RELEASE >> 0.00 33.00 us 33.00 us 33.00 us 1 >> STATFS >> 0.00 22.50 us 13.00 us 32.00 us 2 >> ENTRYLK >> 0.00 32.00 us 26.00 us 38.00 us 2 >> FLUSH >> 0.01 47.50 us 16.00 us 79.00 us 2 >> FINODELK >> 0.01 157.00 us 157.00 us 157.00 us 1 >> CREATE >> 0.01 92.00 us 70.00 us 114.00 us 2 >> FXATTROP >> 0.03 72.57 us 39.00 us 121.00 us 7 >> LOOKUP >> 99.94 47.97 us 15.00 us 1598.00 us 35635 >> WRITE >> Duration: 38 seconds >> Data Read: 0 bytes >> Data Written: 1073741824 bytes >> Interval 0 Stats: >> Block Size: 4096b+ 8192b+ >> 16384b+ >> No. of Reads: 0 0 >> 0 >> No. of Writes: 1576 4173 >> 19605 >> Block Size: 32768b+ 65536b+ >> 131072b+ >> No. of Reads: 0 0 >> 0 >> No. of Writes: 7777 1847 >> 657 >> %-latency Avg-latency Min-Latency Max-Latency No. of calls >> Fop >> --------- ----------- ----------- ----------- ------------ >> ---- >> 0.00 0.00 us 0.00 us 0.00 us 1 >> RELEASE >> 0.00 33.00 us 33.00 us 33.00 us 1 >> STATFS >> 0.00 22.50 us 13.00 us 32.00 us 2 >> ENTRYLK >> 0.00 32.00 us 26.00 us 38.00 us 2 >> FLUSH >> 0.01 47.50 us 16.00 us 79.00 us 2 >> FINODELK >> 0.01 157.00 us 157.00 us 157.00 us 1 >> CREATE >> 0.01 92.00 us 70.00 us 114.00 us 2 >> FXATTROP >> 0.03 72.57 us 39.00 us 121.00 us 7 >> LOOKUP >> 99.94 47.97 us 15.00 us 1598.00 us 35635 >> WRITE >> Duration: 38 seconds >> Data Read: 0 bytes >> Data Written: 1073741824 bytes >> Brick: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick >> ------------------------------------------------------- >> Cumulative Stats: >> Block Size: 4096b+ 8192b+ >> 16384b+ >> No. of Reads: 0 0 >> 0 >> No. of Writes: 1576 4173 >> 19605 >> Block Size: 32768b+ 65536b+ >> 131072b+ >> No. of Reads: 0 0 >> 0 >> No. of Writes: 7777 1847 >> 657 >> %-latency Avg-latency Min-Latency Max-Latency No. of calls >> Fop >> --------- ----------- ----------- ----------- ------------ >> ---- >> 0.00 0.00 us 0.00 us 0.00 us 1 >> RELEASE >> 0.00 58.00 us 58.00 us 58.00 us 1 >> STATFS >> 0.00 38.00 us 38.00 us 38.00 us 2 >> ENTRYLK >> 0.01 59.00 us 32.00 us 86.00 us 2 >> FLUSH >> 0.01 81.00 us 33.00 us 129.00 us 2 >> FINODELK >> 0.01 91.50 us 73.00 us 110.00 us 2 >> FXATTROP >> 0.01 239.00 us 239.00 us 239.00 us 1 >> CREATE >> 0.04 103.14 us 63.00 us 210.00 us 7 >> LOOKUP >> 99.92 52.99 us 16.00 us 11289.00 us 35635 >> WRITE >> Duration: 38 seconds >> Data Read: 0 bytes >> Data Written: 1073741824 bytes >> Interval 0 Stats: >> Block Size: 4096b+ 8192b+ >> 16384b+ >> No. of Reads: 0 0 >> 0 >> No. of Writes: 1576 4173 >> 19605 >> Block Size: 32768b+ 65536b+ >> 131072b+ >> No. of Reads: 0 0 >> 0 >> No. of Writes: 7777 1847 >> 657 >> %-latency Avg-latency Min-Latency Max-Latency No. of calls >> Fop >> --------- ----------- ----------- ----------- ------------ >> ---- >> 0.00 0.00 us 0.00 us 0.00 us 1 >> RELEASE >> 0.00 58.00 us 58.00 us 58.00 us 1 >> STATFS >> 0.00 38.00 us 38.00 us 38.00 us 2 >> ENTRYLK >> 0.01 59.00 us 32.00 us 86.00 us 2 >> FLUSH >> 0.01 81.00 us 33.00 us 129.00 us 2 >> FINODELK >> 0.01 91.50 us 73.00 us 110.00 us 2 >> FXATTROP >> 0.01 239.00 us 239.00 us 239.00 us 1 >> CREATE >> 0.04 103.14 us 63.00 us 210.00 us 7 >> LOOKUP >> 99.92 52.99 us 16.00 us 11289.00 us 35635 >> WRITE >> Duration: 38 seconds >> Data Read: 0 bytes >> Data Written: 1073741824 bytes > > > > At this point I'm officially run out of idea where to look next. So any > help, suggestions or pointers are highly appreciated! > > -- > Best regards, > Anastasia Belyaeva > > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180412/11a89d6f/attachment.html>
Anastasia Belyaeva
2018-Apr-13 17:58 UTC
[Gluster-users] Unreasonably poor performance of replicated volumes
Thanks a lot for your reply! You guessed it right though - mailing lists, various blogs, documentation, videos and even source code at this point. Changing some off the options does make performance slightly better, but nothing particularly groundbreaking. So, if I understand you correctly, no one has yet managed to get acceptable performance (relative to underlying hardware capabilities) with smaller block sizes? Is there an explanation for this? 2018-04-13 1:57 GMT+03:00 Vlad Kopylov <vladkopy at gmail.com>:> Guess you went through user lists and tried something like this already > http://lists.gluster.org/pipermail/gluster-users/2018-April/033811.html > I have a same exact setup and below is as far as it went after months of > trail and error. > We all have somewhat same setup and same issue with this - you can find > same post as yours on the daily basis. > > On Wed, Apr 11, 2018 at 3:03 PM, Anastasia Belyaeva < > anastasia.blv at gmail.com> wrote: > >> Hello everybody! >> >> I have 3 gluster servers (*gluster 3.12.6, Centos 7.2*; those are >> actually virtual machines located on 3 separate physical XenServer7.1 >> servers) >> >> They are all connected via infiniband network. Iperf3 shows around *23 >> Gbit/s network bandwidth *between each 2 of them. >> >> Each server has 3 HDD put into a *stripe*3 thin pool (LVM2) *with >> logical volume created on top of it, formatted with *xfs*. Gluster top >> reports the following throughput: >> >> root at fsnode2 ~ $ gluster volume top r3vol write-perf bs 4096 count >>> 524288 list-cnt 0 >>> Brick: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick >>> Throughput *631.82 MBps *time 3.3989 secs >>> Brick: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick >>> Throughput *566.96 MBps *time 3.7877 secs >>> Brick: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick >>> Throughput *546.65 MBps *time 3.9285 secs >> >> >> root at fsnode2 ~ $ gluster volume top r2vol write-perf bs 4096 count >>> 524288 list-cnt 0 >>> Brick: fsnode2.ibnet:/data/glusterfs/r2vol/brick1/brick >>> Throughput *539.60 MBps *time 3.9798 secs >>> Brick: fsnode4.ibnet:/data/glusterfs/r2vol/brick1/brick >>> Throughput *580.07 MBps *time 3.7021 secs >> >> >> And two *pure replicated ('replica 2' and 'replica 3')* volumes. *The >> 'replica 2' volume is for testing purpose only. >> >>> Volume Name: r2vol >>> Type: Replicate >>> Volume ID: 4748d0c0-6bef-40d5-b1ec-d30e10cfddd9 >>> Status: Started >>> Snapshot Count: 0 >>> Number of Bricks: 1 x 2 = 2 >>> Transport-type: tcp >>> Bricks: >>> Brick1: fsnode2.ibnet:/data/glusterfs/r2vol/brick1/brick >>> Brick2: fsnode4.ibnet:/data/glusterfs/r2vol/brick1/brick >>> Options Reconfigured: >>> nfs.disable: on >>> >> >> >>> Volume Name: r3vol >>> Type: Replicate >>> Volume ID: b0f64c28-57e1-4b9d-946b-26ed6b499f29 >>> Status: Started >>> Snapshot Count: 0 >>> Number of Bricks: 1 x 3 = 3 >>> Transport-type: tcp >>> Bricks: >>> Brick1: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick >>> Brick2: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick >>> Brick3: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick >>> Options Reconfigured: >>> nfs.disable: on >> >> >> >> *Client *is also gluster 3.12.6, Centos 7.3 virtual machine, *FUSE mount* >> >> >>> root at centos7u3-nogdesktop2 ~ $ mount |grep gluster >>> gluster-host.ibnet:/r2vol on /mnt/gluster/r2 type fuse.glusterfs >>> (rw,relatime,user_id=0,group_id=0,default_permissions,allow_ >>> other,max_read=131072) >>> gluster-host.ibnet:/r3vol on /mnt/gluster/r3 type fuse.glusterfs >>> (rw,relatime,user_id=0,group_id=0,default_permissions,allow_ >>> other,max_read=131072) >> >> >> >> *The problem *is that there is a significant performance loss with >> smaller block sizes. For example: >> >> *4K block size* >> [replica 3 volume] >> root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero >> of=/mnt/gluster/r3/file$RANDOM bs=4096 count=262144 >> 262144+0 records in >> 262144+0 records out >> 1073741824 bytes (1.1 GB) copied, 11.2207 s, *95.7 MB/s* >> >> [replica 2 volume] >> root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero >> of=/mnt/gluster/r2/file$RANDOM bs=4096 count=262144 >> 262144+0 records in >> 262144+0 records out >> 1073741824 bytes (1.1 GB) copied, 12.0149 s, *89.4 MB/s* >> >> *512K block size* >> [replica 3 volume] >> root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero >> of=/mnt/gluster/r3/file$RANDOM bs=512K count=2048 >> 2048+0 records in >> 2048+0 records out >> 1073741824 bytes (1.1 GB) copied, 5.27207 s, *204 MB/s* >> >> [replica 2 volume] >> root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero >> of=/mnt/gluster/r2/file$RANDOM bs=512K count=2048 >> 2048+0 records in >> 2048+0 records out >> 1073741824 bytes (1.1 GB) copied, 4.22321 s, *254 MB/s* >> >> With bigger block size It's still not where I expect it to be, but at >> least it starts to make some sense. >> >> I've been trying to solve this for a very long time with no luck. >> I've already tried both kernel tuning (different 'tuned' profiles and the >> ones recommended in the "Linux Kernel Tuning" section) and tweaking gluster >> volume options, including write-behind/flush-behind/writ >> e-behind-window-size. >> The latter, to my surprise, didn't make any difference. 'Cause at first I >> thought it was the buffering issue but it turns out it does buffer writes, >> just not very efficient (well at least what it looks like in the *gluster >> profile output*) >> >> root at fsnode2 ~ $ gluster volume profile r3vol info clear >>> ... >>> Cleared stats. >> >> >> root at centos7u3-nogdesktop2 ~ $ dd if=/dev/zero >>> of=/mnt/gluster/r3/file$RANDOM bs=4096 count=262144 >>> 262144+0 records in >>> 262144+0 records out >>> 1073741824 bytes (1.1 GB) copied, 10.9743 s, 97.8 MB/s >> >> >> >>> root at fsnode2 ~ $ gluster volume profile r3vol info >>> Brick: fsnode2.ibnet:/data/glusterfs/r3vol/brick1/brick >>> ------------------------------------------------------- >>> Cumulative Stats: >>> Block Size: 4096b+ 8192b+ >>> 16384b+ >>> No. of Reads: 0 0 >>> 0 >>> No. of Writes: 1576 4173 >>> 19605 >>> Block Size: 32768b+ 65536b+ >>> 131072b+ >>> No. of Reads: 0 0 >>> 0 >>> No. of Writes: 7777 1847 >>> 657 >>> %-latency Avg-latency Min-Latency Max-Latency No. of calls >>> Fop >>> --------- ----------- ----------- ----------- ------------ >>> ---- >>> 0.00 0.00 us 0.00 us 0.00 us 1 >>> RELEASE >>> 0.00 18.00 us 18.00 us 18.00 us 1 >>> STATFS >>> 0.00 20.50 us 11.00 us 30.00 us 2 >>> FLUSH >>> 0.00 22.50 us 17.00 us 28.00 us 2 >>> FINODELK >>> 0.01 76.50 us 65.00 us 88.00 us 2 >>> FXATTROP >>> 0.01 177.00 us 177.00 us 177.00 us 1 >>> CREATE >>> 0.02 56.14 us 23.00 us 128.00 us 7 >>> LOOKUP >>> 0.02 259.00 us 20.00 us 498.00 us 2 >>> ENTRYLK >>> 99.94 59.23 us 17.00 us 10914.00 us 35635 >>> WRITE >>> Duration: 38 seconds >>> Data Read: 0 bytes >>> Data Written: 1073741824 bytes >>> Interval 0 Stats: >>> Block Size: 4096b+ 8192b+ >>> 16384b+ >>> No. of Reads: 0 0 >>> 0 >>> No. of Writes: 1576 4173 >>> 19605 >>> Block Size: 32768b+ 65536b+ >>> 131072b+ >>> No. of Reads: 0 0 >>> 0 >>> No. of Writes: 7777 1847 >>> 657 >>> %-latency Avg-latency Min-Latency Max-Latency No. of calls >>> Fop >>> --------- ----------- ----------- ----------- ------------ >>> ---- >>> 0.00 0.00 us 0.00 us 0.00 us 1 >>> RELEASE >>> 0.00 18.00 us 18.00 us 18.00 us 1 >>> STATFS >>> 0.00 20.50 us 11.00 us 30.00 us 2 >>> FLUSH >>> 0.00 22.50 us 17.00 us 28.00 us 2 >>> FINODELK >>> 0.01 76.50 us 65.00 us 88.00 us 2 >>> FXATTROP >>> 0.01 177.00 us 177.00 us 177.00 us 1 >>> CREATE >>> 0.02 56.14 us 23.00 us 128.00 us 7 >>> LOOKUP >>> 0.02 259.00 us 20.00 us 498.00 us 2 >>> ENTRYLK >>> 99.94 59.23 us 17.00 us 10914.00 us 35635 >>> WRITE >>> Duration: 38 seconds >>> Data Read: 0 bytes >>> Data Written: 1073741824 bytes >>> Brick: fsnode6.ibnet:/data/glusterfs/r3vol/brick1/brick >>> ------------------------------------------------------- >>> Cumulative Stats: >>> Block Size: 4096b+ 8192b+ >>> 16384b+ >>> No. of Reads: 0 0 >>> 0 >>> No. of Writes: 1576 4173 >>> 19605 >>> Block Size: 32768b+ 65536b+ >>> 131072b+ >>> No. of Reads: 0 0 >>> 0 >>> No. of Writes: 7777 1847 >>> 657 >>> %-latency Avg-latency Min-Latency Max-Latency No. of calls >>> Fop >>> --------- ----------- ----------- ----------- ------------ >>> ---- >>> 0.00 0.00 us 0.00 us 0.00 us 1 >>> RELEASE >>> 0.00 33.00 us 33.00 us 33.00 us 1 >>> STATFS >>> 0.00 22.50 us 13.00 us 32.00 us 2 >>> ENTRYLK >>> 0.00 32.00 us 26.00 us 38.00 us 2 >>> FLUSH >>> 0.01 47.50 us 16.00 us 79.00 us 2 >>> FINODELK >>> 0.01 157.00 us 157.00 us 157.00 us 1 >>> CREATE >>> 0.01 92.00 us 70.00 us 114.00 us 2 >>> FXATTROP >>> 0.03 72.57 us 39.00 us 121.00 us 7 >>> LOOKUP >>> 99.94 47.97 us 15.00 us 1598.00 us 35635 >>> WRITE >>> Duration: 38 seconds >>> Data Read: 0 bytes >>> Data Written: 1073741824 bytes >>> Interval 0 Stats: >>> Block Size: 4096b+ 8192b+ >>> 16384b+ >>> No. of Reads: 0 0 >>> 0 >>> No. of Writes: 1576 4173 >>> 19605 >>> Block Size: 32768b+ 65536b+ >>> 131072b+ >>> No. of Reads: 0 0 >>> 0 >>> No. of Writes: 7777 1847 >>> 657 >>> %-latency Avg-latency Min-Latency Max-Latency No. of calls >>> Fop >>> --------- ----------- ----------- ----------- ------------ >>> ---- >>> 0.00 0.00 us 0.00 us 0.00 us 1 >>> RELEASE >>> 0.00 33.00 us 33.00 us 33.00 us 1 >>> STATFS >>> 0.00 22.50 us 13.00 us 32.00 us 2 >>> ENTRYLK >>> 0.00 32.00 us 26.00 us 38.00 us 2 >>> FLUSH >>> 0.01 47.50 us 16.00 us 79.00 us 2 >>> FINODELK >>> 0.01 157.00 us 157.00 us 157.00 us 1 >>> CREATE >>> 0.01 92.00 us 70.00 us 114.00 us 2 >>> FXATTROP >>> 0.03 72.57 us 39.00 us 121.00 us 7 >>> LOOKUP >>> 99.94 47.97 us 15.00 us 1598.00 us 35635 >>> WRITE >>> Duration: 38 seconds >>> Data Read: 0 bytes >>> Data Written: 1073741824 bytes >>> Brick: fsnode4.ibnet:/data/glusterfs/r3vol/brick1/brick >>> ------------------------------------------------------- >>> Cumulative Stats: >>> Block Size: 4096b+ 8192b+ >>> 16384b+ >>> No. of Reads: 0 0 >>> 0 >>> No. of Writes: 1576 4173 >>> 19605 >>> Block Size: 32768b+ 65536b+ >>> 131072b+ >>> No. of Reads: 0 0 >>> 0 >>> No. of Writes: 7777 1847 >>> 657 >>> %-latency Avg-latency Min-Latency Max-Latency No. of calls >>> Fop >>> --------- ----------- ----------- ----------- ------------ >>> ---- >>> 0.00 0.00 us 0.00 us 0.00 us 1 >>> RELEASE >>> 0.00 58.00 us 58.00 us 58.00 us 1 >>> STATFS >>> 0.00 38.00 us 38.00 us 38.00 us 2 >>> ENTRYLK >>> 0.01 59.00 us 32.00 us 86.00 us 2 >>> FLUSH >>> 0.01 81.00 us 33.00 us 129.00 us 2 >>> FINODELK >>> 0.01 91.50 us 73.00 us 110.00 us 2 >>> FXATTROP >>> 0.01 239.00 us 239.00 us 239.00 us 1 >>> CREATE >>> 0.04 103.14 us 63.00 us 210.00 us 7 >>> LOOKUP >>> 99.92 52.99 us 16.00 us 11289.00 us 35635 >>> WRITE >>> Duration: 38 seconds >>> Data Read: 0 bytes >>> Data Written: 1073741824 bytes >>> Interval 0 Stats: >>> Block Size: 4096b+ 8192b+ >>> 16384b+ >>> No. of Reads: 0 0 >>> 0 >>> No. of Writes: 1576 4173 >>> 19605 >>> Block Size: 32768b+ 65536b+ >>> 131072b+ >>> No. of Reads: 0 0 >>> 0 >>> No. of Writes: 7777 1847 >>> 657 >>> %-latency Avg-latency Min-Latency Max-Latency No. of calls >>> Fop >>> --------- ----------- ----------- ----------- ------------ >>> ---- >>> 0.00 0.00 us 0.00 us 0.00 us 1 >>> RELEASE >>> 0.00 58.00 us 58.00 us 58.00 us 1 >>> STATFS >>> 0.00 38.00 us 38.00 us 38.00 us 2 >>> ENTRYLK >>> 0.01 59.00 us 32.00 us 86.00 us 2 >>> FLUSH >>> 0.01 81.00 us 33.00 us 129.00 us 2 >>> FINODELK >>> 0.01 91.50 us 73.00 us 110.00 us 2 >>> FXATTROP >>> 0.01 239.00 us 239.00 us 239.00 us 1 >>> CREATE >>> 0.04 103.14 us 63.00 us 210.00 us 7 >>> LOOKUP >>> 99.92 52.99 us 16.00 us 11289.00 us 35635 >>> WRITE >>> Duration: 38 seconds >>> Data Read: 0 bytes >>> Data Written: 1073741824 bytes >> >> >> >> At this point I'm officially run out of idea where to look next. So any >> help, suggestions or pointers are highly appreciated! >> >> -- >> Best regards, >> Anastasia Belyaeva >> >> >> >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users >> > >-- Best regards, Anastasia Belyaeva ? ?????????, ????????? ??????? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180413/393a01db/attachment.html>
Maybe Matching Threads
- Unreasonably poor performance of replicated volumes
- Unreasonably poor performance of replicated volumes
- [ovirt-users] Re: Gluster problems, cluster performance issues
- [ovirt-users] Re: Gluster problems, cluster performance issues
- [ovirt-users] Re: Gluster problems, cluster performance issues