Vladimir Melnik
2019-Jul-03 08:39 UTC
[Gluster-users] Extremely low performance - am I doing something wrong?
Dear colleagues, I have a lab with a bunch of virtual machines (the virtualization is provided by KVM) running on the same physical host. 4 of these VMs are working as a GlusterFS cluster and there's one more VM that works as a client. I'll specify all the packages' versions in the ending of this message. I created 2 volumes - one is having type "Distributed-Replicate" and another one is "Distribute". The problem is that both of volumes are showing really poor performance. Here's what I see on the client: $ mount | grep gluster 10.13.1.16:storage1 on /mnt/glusterfs1 type fuse.glusterfs(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) 10.13.1.16:storage2 on /mnt/glusterfs2 type fuse.glusterfs(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) $ for i in {1..5}; do { dd if=/dev/zero of=/mnt/glusterfs1/test.tmp bs=1M count=10 oflag=sync; rm -f /mnt/glusterfs1/test.tmp; } done 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 1.47936 s, 7.1 MB/s 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 1.62546 s, 6.5 MB/s 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 1.71229 s, 6.1 MB/s 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 1.68607 s, 6.2 MB/s 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 1.82204 s, 5.8 MB/s $ for i in {1..5}; do { dd if=/dev/zero of=/mnt/glusterfs2/test.tmp bs=1M count=10 oflag=sync; rm -f /mnt/glusterfs2/test.tmp; } done 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 1.15739 s, 9.1 MB/s 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 0.978528 s, 10.7 MB/s 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 0.910642 s, 11.5 MB/s 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 0.998249 s, 10.5 MB/s 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 1.03377 s, 10.1 MB/s The distributed one shows a bit better performance than the distributed-replicated one, but it's still poor. :-( The disk storage itself is OK, here's what I see on each of 4 GlusterFS servers: for i in {1..5}; do { dd if=/dev/zero of=/mnt/storage1/test.tmp bs=1M count=10 oflag=sync; rm -f /mnt/storage1/test.tmp; } done 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 0.0656698 s, 160 MB/s 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 0.0476927 s, 220 MB/s 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 0.036526 s, 287 MB/s 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 0.0329145 s, 319 MB/s 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 0.0403988 s, 260 MB/s The network between all 5 VMs is OK, they all are working on the same physical host. Can't understand, what am I doing wrong. :-( Here's the detailed info about the volumes: Volume Name: storage1 Type: Distributed-Replicate Volume ID: a42e2554-99e5-4331-bcc4-0900d002ae32 Status: Started Snapshot Count: 0 Number of Bricks: 2 x (2 + 1) = 6 Transport-type: tcp Bricks: Brick1: gluster1.k8s.maitre-d.tucha.ua:/mnt/storage1/brick1 Brick2: gluster2.k8s.maitre-d.tucha.ua:/mnt/storage1/brick2 Brick3: gluster3.k8s.maitre-d.tucha.ua:/mnt/storage1/brick_arbiter (arbiter) Brick4: gluster3.k8s.maitre-d.tucha.ua:/mnt/storage1/brick3 Brick5: gluster4.k8s.maitre-d.tucha.ua:/mnt/storage1/brick4 Brick6: gluster1.k8s.maitre-d.tucha.ua:/mnt/storage1/brick_arbiter (arbiter) Options Reconfigured: transport.address-family: inet nfs.disable: on performance.client-io-threads: off Volume Name: storage2 Type: Distribute Volume ID: df4d8096-ad03-493e-9e0e-586ce21fb067 Status: Started Snapshot Count: 0 Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: gluster1.k8s.maitre-d.tucha.ua:/mnt/storage2 Brick2: gluster2.k8s.maitre-d.tucha.ua:/mnt/storage2 Brick3: gluster3.k8s.maitre-d.tucha.ua:/mnt/storage2 Brick4: gluster4.k8s.maitre-d.tucha.ua:/mnt/storage2 Options Reconfigured: transport.address-family: inet nfs.disable: on The OS is CentOS Linux release 7.6.1810. The packages I'm using are: glusterfs-6.3-1.el7.x86_64 glusterfs-api-6.3-1.el7.x86_64 glusterfs-cli-6.3-1.el7.x86_64 glusterfs-client-xlators-6.3-1.el7.x86_64 glusterfs-fuse-6.3-1.el7.x86_64 glusterfs-libs-6.3-1.el7.x86_64 glusterfs-server-6.3-1.el7.x86_64 kernel-3.10.0-327.el7.x86_64 kernel-3.10.0-514.2.2.el7.x86_64 kernel-3.10.0-957.12.1.el7.x86_64 kernel-3.10.0-957.12.2.el7.x86_64 kernel-3.10.0-957.21.3.el7.x86_64 kernel-tools-3.10.0-957.21.3.el7.x86_64 kernel-tools-libs-3.10.0-957.21.3.el7.x86_6 Please, be so kind as to help me to understand, did I do it wrong or that's quite normal performance of GlusterFS? Thanks in advance!
Vladimir Melnik
2019-Jul-03 10:20 UTC
[Gluster-users] Extremely low performance - am I doing something wrong?
Just to be exact - here are the results of iperf3 measurements between the client and one of servers: $ iperf3 -c gluster1 Connecting to host gluster1, port 5201 [ 4] local 10.13.16.1 port 33156 connected to 10.13.1.16 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 77.8 MBytes 652 Mbits/sec 3 337 KBytes [ 4] 1.00-2.00 sec 89.7 MBytes 752 Mbits/sec 0 505 KBytes [ 4] 2.00-3.00 sec 103 MBytes 862 Mbits/sec 2 631 KBytes [ 4] 3.00-4.00 sec 104 MBytes 870 Mbits/sec 1 741 KBytes [ 4] 4.00-5.00 sec 98.8 MBytes 828 Mbits/sec 1 834 KBytes [ 4] 5.00-6.00 sec 101 MBytes 849 Mbits/sec 0 923 KBytes [ 4] 6.00-7.00 sec 102 MBytes 860 Mbits/sec 0 1005 KBytes [ 4] 7.00-8.00 sec 106 MBytes 890 Mbits/sec 0 1.06 MBytes [ 4] 8.00-9.00 sec 109 MBytes 913 Mbits/sec 0 1.13 MBytes [ 4] 9.00-10.00 sec 109 MBytes 912 Mbits/sec 0 1.20 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 1000 MBytes 839 Mbits/sec 7 sender [ 4] 0.00-10.00 sec 998 MBytes 837 Mbits/sec receiver iperf Done. $ iperf3 -c gluster1 -R Connecting to host gluster1, port 5201 Reverse mode, remote host gluster1 is sending [ 4] local 10.13.16.1 port 33160 connected to 10.13.1.16 port 5201 [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 58.8 MBytes 492 Mbits/sec [ 4] 1.00-2.00 sec 80.1 MBytes 673 Mbits/sec [ 4] 2.00-3.00 sec 83.8 MBytes 703 Mbits/sec [ 4] 3.00-4.00 sec 95.6 MBytes 800 Mbits/sec [ 4] 4.00-5.00 sec 102 MBytes 858 Mbits/sec [ 4] 5.00-6.00 sec 101 MBytes 850 Mbits/sec [ 4] 6.00-7.00 sec 102 MBytes 860 Mbits/sec [ 4] 7.00-8.00 sec 107 MBytes 898 Mbits/sec [ 4] 8.00-9.00 sec 106 MBytes 893 Mbits/sec [ 4] 9.00-10.00 sec 108 MBytes 904 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 949 MBytes 796 Mbits/sec 6 sender [ 4] 0.00-10.00 sec 946 MBytes 794 Mbits/sec receiver iperf Done. So, the bandwidth seems to be OK too. On Wed, Jul 03, 2019 at 11:39:41AM +0300, Vladimir Melnik wrote:> Dear colleagues, > > I have a lab with a bunch of virtual machines (the virtualization is > provided by KVM) running on the same physical host. 4 of these VMs are > working as a GlusterFS cluster and there's one more VM that works as a > client. I'll specify all the packages' versions in the ending of this > message. > > I created 2 volumes - one is having type "Distributed-Replicate" and > another one is "Distribute". The problem is that both of volumes are > showing really poor performance. > > Here's what I see on the client: > $ mount | grep gluster > 10.13.1.16:storage1 on /mnt/glusterfs1 type fuse.glusterfs(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) > 10.13.1.16:storage2 on /mnt/glusterfs2 type fuse.glusterfs(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) > > $ for i in {1..5}; do { dd if=/dev/zero of=/mnt/glusterfs1/test.tmp bs=1M count=10 oflag=sync; rm -f /mnt/glusterfs1/test.tmp; } done > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 1.47936 s, 7.1 MB/s > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 1.62546 s, 6.5 MB/s > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 1.71229 s, 6.1 MB/s > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 1.68607 s, 6.2 MB/s > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 1.82204 s, 5.8 MB/s > > $ for i in {1..5}; do { dd if=/dev/zero of=/mnt/glusterfs2/test.tmp bs=1M count=10 oflag=sync; rm -f /mnt/glusterfs2/test.tmp; } done > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 1.15739 s, 9.1 MB/s > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 0.978528 s, 10.7 MB/s > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 0.910642 s, 11.5 MB/s > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 0.998249 s, 10.5 MB/s > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 1.03377 s, 10.1 MB/s > > The distributed one shows a bit better performance than the > distributed-replicated one, but it's still poor. :-( > > The disk storage itself is OK, here's what I see on each of 4 GlusterFS > servers: > for i in {1..5}; do { dd if=/dev/zero of=/mnt/storage1/test.tmp bs=1M count=10 oflag=sync; rm -f /mnt/storage1/test.tmp; } done > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 0.0656698 s, 160 MB/s > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 0.0476927 s, 220 MB/s > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 0.036526 s, 287 MB/s > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 0.0329145 s, 319 MB/s > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 0.0403988 s, 260 MB/s > > The network between all 5 VMs is OK, they all are working on the same > physical host. > > Can't understand, what am I doing wrong. :-( > > Here's the detailed info about the volumes: > Volume Name: storage1 > Type: Distributed-Replicate > Volume ID: a42e2554-99e5-4331-bcc4-0900d002ae32 > Status: Started > Snapshot Count: 0 > Number of Bricks: 2 x (2 + 1) = 6 > Transport-type: tcp > Bricks: > Brick1: gluster1.k8s.maitre-d.tucha.ua:/mnt/storage1/brick1 > Brick2: gluster2.k8s.maitre-d.tucha.ua:/mnt/storage1/brick2 > Brick3: gluster3.k8s.maitre-d.tucha.ua:/mnt/storage1/brick_arbiter (arbiter) > Brick4: gluster3.k8s.maitre-d.tucha.ua:/mnt/storage1/brick3 > Brick5: gluster4.k8s.maitre-d.tucha.ua:/mnt/storage1/brick4 > Brick6: gluster1.k8s.maitre-d.tucha.ua:/mnt/storage1/brick_arbiter (arbiter) > Options Reconfigured: > transport.address-family: inet > nfs.disable: on > performance.client-io-threads: off > > Volume Name: storage2 > Type: Distribute > Volume ID: df4d8096-ad03-493e-9e0e-586ce21fb067 > Status: Started > Snapshot Count: 0 > Number of Bricks: 4 > Transport-type: tcp > Bricks: > Brick1: gluster1.k8s.maitre-d.tucha.ua:/mnt/storage2 > Brick2: gluster2.k8s.maitre-d.tucha.ua:/mnt/storage2 > Brick3: gluster3.k8s.maitre-d.tucha.ua:/mnt/storage2 > Brick4: gluster4.k8s.maitre-d.tucha.ua:/mnt/storage2 > Options Reconfigured: > transport.address-family: inet > nfs.disable: on > > The OS is CentOS Linux release 7.6.1810. The packages I'm using are: > glusterfs-6.3-1.el7.x86_64 > glusterfs-api-6.3-1.el7.x86_64 > glusterfs-cli-6.3-1.el7.x86_64 > glusterfs-client-xlators-6.3-1.el7.x86_64 > glusterfs-fuse-6.3-1.el7.x86_64 > glusterfs-libs-6.3-1.el7.x86_64 > glusterfs-server-6.3-1.el7.x86_64 > kernel-3.10.0-327.el7.x86_64 > kernel-3.10.0-514.2.2.el7.x86_64 > kernel-3.10.0-957.12.1.el7.x86_64 > kernel-3.10.0-957.12.2.el7.x86_64 > kernel-3.10.0-957.21.3.el7.x86_64 > kernel-tools-3.10.0-957.21.3.el7.x86_64 > kernel-tools-libs-3.10.0-957.21.3.el7.x86_6 > > Please, be so kind as to help me to understand, did I do it wrong or > that's quite normal performance of GlusterFS? > > Thanks in advance! > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users-- V.Melnik