----- Original Message -----> From: "Abi Askushi" <rightkicktech at gmail.com> > To: "Ben Turner" <bturner at redhat.com> > Cc: "Krutika Dhananjay" <kdhananj at redhat.com>, "gluster-user" <gluster-users at gluster.org> > Sent: Monday, September 11, 2017 1:40:42 AM > Subject: Re: [Gluster-users] Slow performance of gluster volume > > Did not upgrade yet gluster. I am still using 3.8.12. Only the mentioned > changes did provide the performance boost. > > From which version to which version did you see such performance boost? I > will try to upgrade and check difference also.Unfortunately I didn't record the package versions, I also may have done the same thing as you :) -b> > On Sep 11, 2017 2:45 AM, "Ben Turner" <bturner at redhat.com> wrote: > > Great to hear! > > ----- Original Message ----- > > From: "Abi Askushi" <rightkicktech at gmail.com> > > To: "Krutika Dhananjay" <kdhananj at redhat.com> > > Cc: "gluster-user" <gluster-users at gluster.org> > > Sent: Friday, September 8, 2017 7:01:00 PM > > Subject: Re: [Gluster-users] Slow performance of gluster volume > > > > Following changes resolved the perf issue: > > > > Added the option > > /etc/glusterfs/glusterd.vol : > > option rpc-auth-allow-insecure on > > Was it this setting or was it the gluster upgrade, do you know for sure? > It may be helpful to others to know for sure(Im interested too:). > > -b > > > > > restarted glusterd > > > > Then set the volume option: > > gluster volume set vms server.allow-insecure on > > > > I am reaching now the max network bandwidth and performance of VMs is > quite > > good. > > > > Did not upgrade the glusterd. > > > > As a next try I am thinking to upgrade gluster to 3.12 + test libgfapi > > integration of qemu by upgrading to ovirt 4.1.5 and check vm perf. > > > > > > On Sep 6, 2017 1:20 PM, "Abi Askushi" < rightkicktech at gmail.com > wrote: > > > > > > > > I tried to follow step from > > https://wiki.centos.org/SpecialInterestGroup/Storage to install latest > > gluster on the first node. > > It installed 3.10 and not 3.11. I am not sure how to install 3.11 without > > compiling it. > > Then when tried to start the gluster on the node the bricks were reported > > down (the other 2 nodes have still 3.8). No sure why. The logs were > showing > > the below (even after rebooting the server): > > > > [2017-09-06 10:56:09.023777] E [rpcsvc.c:557:rpcsvc_check_and_reply_error] > > 0-rpcsvc: rpc actor failed to complete successfully > > [2017-09-06 10:56:09.024122] E [server-helpers.c:395:server_alloc_frame] > > (-->/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x325) [0x7f2d0ec20905] > > -->/usr/lib64/glusterfs/3.10.5/xlator/protocol/server.so(+0x3006b) > > [0x7f2cfa4bf06b] > > -->/usr/lib64/glusterfs/3.10.5/xlator/protocol/server.so(+0xdb34) > > [0x7f2cfa49cb34] ) 0-server: invalid argument: client [Invalid argument] > > > > Do I need to upgrade all nodes before I attempt to start the gluster > > services? > > I reverted the first node back to 3.8 at the moment and all restored. > > Also tests with eager lock disabled did not make any difference. > > > > > > > > > > On Wed, Sep 6, 2017 at 11:15 AM, Krutika Dhananjay < kdhananj at redhat.com > > > wrote: > > > > > > > > Do you see any improvement with 3.11.1 as that has a patch that improves > perf > > for this kind of a workload > > > > Also, could you disable eager-lock and check if that helps? I see that max > > time is being spent in acquiring locks. > > > > -Krutika > > > > On Wed, Sep 6, 2017 at 1:38 PM, Abi Askushi < rightkicktech at gmail.com > > > wrote: > > > > > > > > Hi Krutika, > > > > Is it anything in the profile indicating what is causing this bottleneck? > In > > case i can collect any other info let me know. > > > > Thanx > > > > On Sep 5, 2017 13:27, "Abi Askushi" < rightkicktech at gmail.com > wrote: > > > > > > > > Hi Krutika, > > > > Attached the profile stats. I enabled profiling then ran some dd tests. > Also > > 3 Windows VMs are running on top this volume but did not do any stress > > testing on the VMs. I have left the profiling enabled in case more time is > > needed for useful stats. > > > > Thanx > > > > On Tue, Sep 5, 2017 at 12:48 PM, Krutika Dhananjay < kdhananj at redhat.com > > > wrote: > > > > > > > > OK my understanding is that with preallocated disks the performance with > and > > without shard will be the same. > > > > In any case, please attach the volume profile[1], so we can see what else > is > > slowing things down. > > > > -Krutika > > > > [1] - > > https://gluster.readthedocs.io/en/latest/Administrator% > 20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command > > > > On Tue, Sep 5, 2017 at 2:32 PM, Abi Askushi < rightkicktech at gmail.com > > > wrote: > > > > > > > > Hi Krutika, > > > > I already have a preallocated disk on VM. > > Now I am checking performance with dd on the hypervisors which have the > > gluster volume configured. > > > > I tried also several values of shard-block-size and I keep getting the > same > > low values on write performance. > > Enabling client-io-threads also did not have any affect. > > > > The version of gluster I am using is glusterfs 3.8.12 built on May 11 2017 > > 18:46:20. > > The setup is a set of 3 Centos 7.3 servers and ovirt 4.1, using gluster as > > storage. > > > > Below are the current settings: > > > > > > Volume Name: vms > > Type: Replicate > > Volume ID: 4513340d-7919-498b-bfe0-d836b5cea40b > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x (2 + 1) = 3 > > Transport-type: tcp > > Bricks: > > Brick1: gluster0:/gluster/vms/brick > > Brick2: gluster1:/gluster/vms/brick > > Brick3: gluster2:/gluster/vms/brick (arbiter) > > Options Reconfigured: > > server.event-threads: 4 > > client.event-threads: 4 > > performance.client-io-threads: on > > features.shard-block-size: 512MB > > cluster.granular-entry-heal: enable > > performance.strict-o-direct: on > > network.ping-timeout: 30 > > storage.owner-gid: 36 > > storage.owner-uid: 36 > > user.cifs: off > > features.shard: on > > cluster.shd-wait-qlength: 10000 > > cluster.shd-max-threads: 8 > > cluster.locking-scheme: granular > > cluster.data-self-heal-algorithm: full > > cluster.server-quorum-type: server > > cluster.quorum-type: auto > > cluster.eager-lock: enable > > network.remote-dio: off > > performance.low-prio-threads: 32 > > performance.stat-prefetch: on > > performance.io-cache: off > > performance.read-ahead: off > > performance.quick-read: off > > transport.address-family: inet > > performance.readdir-ahead: on > > nfs.disable: on > > nfs.export-volumes: on > > > > > > I observed that when testing with dd if=/dev/zero of=testfile bs=1G > count=1 I > > get 65MB/s on the vms gluster volume (and the network traffic between the > > servers reaches ~ 500Mbps), while when testing with dd if=/dev/zero > > of=testfile bs=1G count=1 oflag=direct I get a consistent 10MB/s and the > > network traffic hardly reaching 100Mbps. > > > > Any other things one can do? > > > > On Tue, Sep 5, 2017 at 5:57 AM, Krutika Dhananjay < kdhananj at redhat.com > > > wrote: > > > > > > > > I'm assuming you are using this volume to store vm images, because I see > > shard in the options list. > > > > Speaking from shard translator's POV, one thing you can do to improve > > performance is to use preallocated images. > > This will at least eliminate the need for shard to perform multiple steps > as > > part of the writes - such as creating the shard and then writing to it and > > then updating the aggregated file size - all of which require one network > > call each, which further get blown up once they reach AFR (replicate) into > > many more network calls. > > > > Second, I'm assuming you're using the default shard block size of 4MB (you > > can confirm this using `gluster volume get <VOL> shard-block-size`). In > our > > tests, we've found that larger shard sizes perform better. So maybe change > > the shard-block-size to 64MB (`gluster volume set <VOL> shard-block-size > > 64MB`). > > > > Third, keep stat-prefetch enabled. We've found that qemu sends quite a > lot of > > [f]stats which can be served from the (md)cache to improve performance. So > > enable that. > > > > Also, could you also enable client-io-threads and see if that improves > > performance? > > > > Which version of gluster are you using BTW? > > > > -Krutika > > > > > > On Tue, Sep 5, 2017 at 4:32 AM, Abi Askushi < rightkicktech at gmail.com > > > wrote: > > > > > > > > Hi all, > > > > I have a gluster volume used to host several VMs (managed through oVirt). > > The volume is a replica 3 with arbiter and the 3 servers use 1 Gbit > network > > for the storage. > > > > When testing with dd (dd if=/dev/zero of=testfile bs=1G count=1 > oflag=direct) > > out of the volume (e.g. writing at /root/) the performance of the dd is > > reported to be ~ 700MB/s, which is quite decent. When testing the dd on > the > > gluster volume I get ~ 43 MB/s which way lower from the previous. When > > testing with dd the gluster volume, the network traffic was not exceeding > > 450 Mbps on the network interface. I would expect to reach near 900 Mbps > > considering that there is 1 Gbit of bandwidth available. This results > having > > VMs with very slow performance (especially on their write operations). > > > > The full details of the volume are below. Any advise on what can be > tweaked > > will be highly appreciated. > > > > Volume Name: vms > > Type: Replicate > > Volume ID: 4513340d-7919-498b-bfe0-d836b5cea40b > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x (2 + 1) = 3 > > Transport-type: tcp > > Bricks: > > Brick1: gluster0:/gluster/vms/brick > > Brick2: gluster1:/gluster/vms/brick > > Brick3: gluster2:/gluster/vms/brick (arbiter) > > Options Reconfigured: > > cluster.granular-entry-heal: enable > > performance.strict-o-direct: on > > network.ping-timeout: 30 > > storage.owner-gid: 36 > > storage.owner-uid: 36 > > user.cifs: off > > features.shard: on > > cluster.shd-wait-qlength: 10000 > > cluster.shd-max-threads: 8 > > cluster.locking-scheme: granular > > cluster.data-self-heal-algorithm: full > > cluster.server-quorum-type: server > > cluster.quorum-type: auto > > cluster.eager-lock: enable > > network.remote-dio: off > > performance.low-prio-threads: 32 > > performance.stat-prefetch: off > > performance.io-cache: off > > performance.read-ahead: off > > performance.quick-read: off > > transport.address-family: inet > > performance.readdir-ahead: on > > nfs.disable: on > > nfs.export-volumes: on > > > > > > Thanx, > > Alex > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://lists.gluster.org/mailman/listinfo/gluster-users >
Dmitri Chebotarov
2017-Sep-11 16:27 UTC
[Gluster-users] Slow performance of gluster volume
Hi Abi Can you please share your current transfer speeds after you made the change? Thank you. On Mon, Sep 11, 2017 at 9:55 AM, Ben Turner <bturner at redhat.com> wrote:> ----- Original Message ----- > > From: "Abi Askushi" <rightkicktech at gmail.com> > > To: "Ben Turner" <bturner at redhat.com> > > Cc: "Krutika Dhananjay" <kdhananj at redhat.com>, "gluster-user" < > gluster-users at gluster.org> > > Sent: Monday, September 11, 2017 1:40:42 AM > > Subject: Re: [Gluster-users] Slow performance of gluster volume > > > > Did not upgrade yet gluster. I am still using 3.8.12. Only the mentioned > > changes did provide the performance boost. > > > > From which version to which version did you see such performance boost? I > > will try to upgrade and check difference also. > > Unfortunately I didn't record the package versions, I also may have done > the same thing as you :) > > -b > > > > > On Sep 11, 2017 2:45 AM, "Ben Turner" <bturner at redhat.com> wrote: > > > > Great to hear! > > > > ----- Original Message ----- > > > From: "Abi Askushi" <rightkicktech at gmail.com> > > > To: "Krutika Dhananjay" <kdhananj at redhat.com> > > > Cc: "gluster-user" <gluster-users at gluster.org> > > > Sent: Friday, September 8, 2017 7:01:00 PM > > > Subject: Re: [Gluster-users] Slow performance of gluster volume > > > > > > Following changes resolved the perf issue: > > > > > > Added the option > > > /etc/glusterfs/glusterd.vol : > > > option rpc-auth-allow-insecure on > > > > Was it this setting or was it the gluster upgrade, do you know for sure? > > It may be helpful to others to know for sure(Im interested too:). > > > > -b > > > > > > > > restarted glusterd > > > > > > Then set the volume option: > > > gluster volume set vms server.allow-insecure on > > > > > > I am reaching now the max network bandwidth and performance of VMs is > > quite > > > good. > > > > > > Did not upgrade the glusterd. > > > > > > As a next try I am thinking to upgrade gluster to 3.12 + test libgfapi > > > integration of qemu by upgrading to ovirt 4.1.5 and check vm perf. > > > > > > > > > On Sep 6, 2017 1:20 PM, "Abi Askushi" < rightkicktech at gmail.com > > wrote: > > > > > > > > > > > > I tried to follow step from > > > https://wiki.centos.org/SpecialInterestGroup/Storage to install latest > > > gluster on the first node. > > > It installed 3.10 and not 3.11. I am not sure how to install 3.11 > without > > > compiling it. > > > Then when tried to start the gluster on the node the bricks were > reported > > > down (the other 2 nodes have still 3.8). No sure why. The logs were > > showing > > > the below (even after rebooting the server): > > > > > > [2017-09-06 10:56:09.023777] E [rpcsvc.c:557:rpcsvc_check_ > and_reply_error] > > > 0-rpcsvc: rpc actor failed to complete successfully > > > [2017-09-06 10:56:09.024122] E [server-helpers.c:395:server_ > alloc_frame] > > > (-->/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x325) > [0x7f2d0ec20905] > > > -->/usr/lib64/glusterfs/3.10.5/xlator/protocol/server.so(+0x3006b) > > > [0x7f2cfa4bf06b] > > > -->/usr/lib64/glusterfs/3.10.5/xlator/protocol/server.so(+0xdb34) > > > [0x7f2cfa49cb34] ) 0-server: invalid argument: client [Invalid > argument] > > > > > > Do I need to upgrade all nodes before I attempt to start the gluster > > > services? > > > I reverted the first node back to 3.8 at the moment and all restored. > > > Also tests with eager lock disabled did not make any difference. > > > > > > > > > > > > > > > On Wed, Sep 6, 2017 at 11:15 AM, Krutika Dhananjay < > kdhananj at redhat.com > > > > wrote: > > > > > > > > > > > > Do you see any improvement with 3.11.1 as that has a patch that > improves > > perf > > > for this kind of a workload > > > > > > Also, could you disable eager-lock and check if that helps? I see that > max > > > time is being spent in acquiring locks. > > > > > > -Krutika > > > > > > On Wed, Sep 6, 2017 at 1:38 PM, Abi Askushi < rightkicktech at gmail.com > > > > > wrote: > > > > > > > > > > > > Hi Krutika, > > > > > > Is it anything in the profile indicating what is causing this > bottleneck? > > In > > > case i can collect any other info let me know. > > > > > > Thanx > > > > > > On Sep 5, 2017 13:27, "Abi Askushi" < rightkicktech at gmail.com > wrote: > > > > > > > > > > > > Hi Krutika, > > > > > > Attached the profile stats. I enabled profiling then ran some dd tests. > > Also > > > 3 Windows VMs are running on top this volume but did not do any stress > > > testing on the VMs. I have left the profiling enabled in case more > time is > > > needed for useful stats. > > > > > > Thanx > > > > > > On Tue, Sep 5, 2017 at 12:48 PM, Krutika Dhananjay < > kdhananj at redhat.com > > > > wrote: > > > > > > > > > > > > OK my understanding is that with preallocated disks the performance > with > > and > > > without shard will be the same. > > > > > > In any case, please attach the volume profile[1], so we can see what > else > > is > > > slowing things down. > > > > > > -Krutika > > > > > > [1] - > > > https://gluster.readthedocs.io/en/latest/Administrator% > > 20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command > > > > > > On Tue, Sep 5, 2017 at 2:32 PM, Abi Askushi < rightkicktech at gmail.com > > > > > wrote: > > > > > > > > > > > > Hi Krutika, > > > > > > I already have a preallocated disk on VM. > > > Now I am checking performance with dd on the hypervisors which have the > > > gluster volume configured. > > > > > > I tried also several values of shard-block-size and I keep getting the > > same > > > low values on write performance. > > > Enabling client-io-threads also did not have any affect. > > > > > > The version of gluster I am using is glusterfs 3.8.12 built on May 11 > 2017 > > > 18:46:20. > > > The setup is a set of 3 Centos 7.3 servers and ovirt 4.1, using > gluster as > > > storage. > > > > > > Below are the current settings: > > > > > > > > > Volume Name: vms > > > Type: Replicate > > > Volume ID: 4513340d-7919-498b-bfe0-d836b5cea40b > > > Status: Started > > > Snapshot Count: 0 > > > Number of Bricks: 1 x (2 + 1) = 3 > > > Transport-type: tcp > > > Bricks: > > > Brick1: gluster0:/gluster/vms/brick > > > Brick2: gluster1:/gluster/vms/brick > > > Brick3: gluster2:/gluster/vms/brick (arbiter) > > > Options Reconfigured: > > > server.event-threads: 4 > > > client.event-threads: 4 > > > performance.client-io-threads: on > > > features.shard-block-size: 512MB > > > cluster.granular-entry-heal: enable > > > performance.strict-o-direct: on > > > network.ping-timeout: 30 > > > storage.owner-gid: 36 > > > storage.owner-uid: 36 > > > user.cifs: off > > > features.shard: on > > > cluster.shd-wait-qlength: 10000 > > > cluster.shd-max-threads: 8 > > > cluster.locking-scheme: granular > > > cluster.data-self-heal-algorithm: full > > > cluster.server-quorum-type: server > > > cluster.quorum-type: auto > > > cluster.eager-lock: enable > > > network.remote-dio: off > > > performance.low-prio-threads: 32 > > > performance.stat-prefetch: on > > > performance.io-cache: off > > > performance.read-ahead: off > > > performance.quick-read: off > > > transport.address-family: inet > > > performance.readdir-ahead: on > > > nfs.disable: on > > > nfs.export-volumes: on > > > > > > > > > I observed that when testing with dd if=/dev/zero of=testfile bs=1G > > count=1 I > > > get 65MB/s on the vms gluster volume (and the network traffic between > the > > > servers reaches ~ 500Mbps), while when testing with dd if=/dev/zero > > > of=testfile bs=1G count=1 oflag=direct I get a consistent 10MB/s and > the > > > network traffic hardly reaching 100Mbps. > > > > > > Any other things one can do? > > > > > > On Tue, Sep 5, 2017 at 5:57 AM, Krutika Dhananjay < > kdhananj at redhat.com > > > > wrote: > > > > > > > > > > > > I'm assuming you are using this volume to store vm images, because I > see > > > shard in the options list. > > > > > > Speaking from shard translator's POV, one thing you can do to improve > > > performance is to use preallocated images. > > > This will at least eliminate the need for shard to perform multiple > steps > > as > > > part of the writes - such as creating the shard and then writing to it > and > > > then updating the aggregated file size - all of which require one > network > > > call each, which further get blown up once they reach AFR (replicate) > into > > > many more network calls. > > > > > > Second, I'm assuming you're using the default shard block size of 4MB > (you > > > can confirm this using `gluster volume get <VOL> shard-block-size`). In > > our > > > tests, we've found that larger shard sizes perform better. So maybe > change > > > the shard-block-size to 64MB (`gluster volume set <VOL> > shard-block-size > > > 64MB`). > > > > > > Third, keep stat-prefetch enabled. We've found that qemu sends quite a > > lot of > > > [f]stats which can be served from the (md)cache to improve > performance. So > > > enable that. > > > > > > Also, could you also enable client-io-threads and see if that improves > > > performance? > > > > > > Which version of gluster are you using BTW? > > > > > > -Krutika > > > > > > > > > On Tue, Sep 5, 2017 at 4:32 AM, Abi Askushi < rightkicktech at gmail.com > > > > > wrote: > > > > > > > > > > > > Hi all, > > > > > > I have a gluster volume used to host several VMs (managed through > oVirt). > > > The volume is a replica 3 with arbiter and the 3 servers use 1 Gbit > > network > > > for the storage. > > > > > > When testing with dd (dd if=/dev/zero of=testfile bs=1G count=1 > > oflag=direct) > > > out of the volume (e.g. writing at /root/) the performance of the dd is > > > reported to be ~ 700MB/s, which is quite decent. When testing the dd on > > the > > > gluster volume I get ~ 43 MB/s which way lower from the previous. When > > > testing with dd the gluster volume, the network traffic was not > exceeding > > > 450 Mbps on the network interface. I would expect to reach near 900 > Mbps > > > considering that there is 1 Gbit of bandwidth available. This results > > having > > > VMs with very slow performance (especially on their write operations). > > > > > > The full details of the volume are below. Any advise on what can be > > tweaked > > > will be highly appreciated. > > > > > > Volume Name: vms > > > Type: Replicate > > > Volume ID: 4513340d-7919-498b-bfe0-d836b5cea40b > > > Status: Started > > > Snapshot Count: 0 > > > Number of Bricks: 1 x (2 + 1) = 3 > > > Transport-type: tcp > > > Bricks: > > > Brick1: gluster0:/gluster/vms/brick > > > Brick2: gluster1:/gluster/vms/brick > > > Brick3: gluster2:/gluster/vms/brick (arbiter) > > > Options Reconfigured: > > > cluster.granular-entry-heal: enable > > > performance.strict-o-direct: on > > > network.ping-timeout: 30 > > > storage.owner-gid: 36 > > > storage.owner-uid: 36 > > > user.cifs: off > > > features.shard: on > > > cluster.shd-wait-qlength: 10000 > > > cluster.shd-max-threads: 8 > > > cluster.locking-scheme: granular > > > cluster.data-self-heal-algorithm: full > > > cluster.server-quorum-type: server > > > cluster.quorum-type: auto > > > cluster.eager-lock: enable > > > network.remote-dio: off > > > performance.low-prio-threads: 32 > > > performance.stat-prefetch: off > > > performance.io-cache: off > > > performance.read-ahead: off > > > performance.quick-read: off > > > transport.address-family: inet > > > performance.readdir-ahead: on > > > nfs.disable: on > > > nfs.export-volumes: on > > > > > > > > > Thanx, > > > Alex > > > > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users at gluster.org > > > http://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users at gluster.org > > > http://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170911/60c9d406/attachment.html>
Hi Dmitri, I was getting 8 - 10 MB/s on the gluster mount before the changes, which was really slow and the VMs where sluggish and impractical to use. After the changes, I am getting 70 MB/s which is the expected considering I am using 1 Gbit network for the storage. The VMs also became way more responsive and are giving a normal user experience. For a quick test on the mount point on the host I am using: dd if=/dev/zero of=testfile bs=1M count=1024 oflag=direct On Mon, Sep 11, 2017 at 7:27 PM, Dmitri Chebotarov <4dimach at gmail.com> wrote:> Hi Abi > > Can you please share your current transfer speeds after you made the > change? > > Thank you. > > On Mon, Sep 11, 2017 at 9:55 AM, Ben Turner <bturner at redhat.com> wrote: > >> ----- Original Message ----- >> > From: "Abi Askushi" <rightkicktech at gmail.com> >> > To: "Ben Turner" <bturner at redhat.com> >> > Cc: "Krutika Dhananjay" <kdhananj at redhat.com>, "gluster-user" < >> gluster-users at gluster.org> >> > Sent: Monday, September 11, 2017 1:40:42 AM >> > Subject: Re: [Gluster-users] Slow performance of gluster volume >> > >> > Did not upgrade yet gluster. I am still using 3.8.12. Only the >> mentioned >> > changes did provide the performance boost. >> > >> > From which version to which version did you see such performance boost? >> I >> > will try to upgrade and check difference also. >> >> Unfortunately I didn't record the package versions, I also may have done >> the same thing as you :) >> >> -b >> >> > >> > On Sep 11, 2017 2:45 AM, "Ben Turner" <bturner at redhat.com> wrote: >> > >> > Great to hear! >> > >> > ----- Original Message ----- >> > > From: "Abi Askushi" <rightkicktech at gmail.com> >> > > To: "Krutika Dhananjay" <kdhananj at redhat.com> >> > > Cc: "gluster-user" <gluster-users at gluster.org> >> > > Sent: Friday, September 8, 2017 7:01:00 PM >> > > Subject: Re: [Gluster-users] Slow performance of gluster volume >> > > >> > > Following changes resolved the perf issue: >> > > >> > > Added the option >> > > /etc/glusterfs/glusterd.vol : >> > > option rpc-auth-allow-insecure on >> > >> > Was it this setting or was it the gluster upgrade, do you know for sure? >> > It may be helpful to others to know for sure(Im interested too:). >> > >> > -b >> > >> > > >> > > restarted glusterd >> > > >> > > Then set the volume option: >> > > gluster volume set vms server.allow-insecure on >> > > >> > > I am reaching now the max network bandwidth and performance of VMs is >> > quite >> > > good. >> > > >> > > Did not upgrade the glusterd. >> > > >> > > As a next try I am thinking to upgrade gluster to 3.12 + test libgfapi >> > > integration of qemu by upgrading to ovirt 4.1.5 and check vm perf. >> > > >> > > >> > > On Sep 6, 2017 1:20 PM, "Abi Askushi" < rightkicktech at gmail.com > >> wrote: >> > > >> > > >> > > >> > > I tried to follow step from >> > > https://wiki.centos.org/SpecialInterestGroup/Storage to install >> latest >> > > gluster on the first node. >> > > It installed 3.10 and not 3.11. I am not sure how to install 3.11 >> without >> > > compiling it. >> > > Then when tried to start the gluster on the node the bricks were >> reported >> > > down (the other 2 nodes have still 3.8). No sure why. The logs were >> > showing >> > > the below (even after rebooting the server): >> > > >> > > [2017-09-06 10:56:09.023777] E [rpcsvc.c:557:rpcsvc_check_and >> _reply_error] >> > > 0-rpcsvc: rpc actor failed to complete successfully >> > > [2017-09-06 10:56:09.024122] E [server-helpers.c:395:server_a >> lloc_frame] >> > > (-->/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x325) >> [0x7f2d0ec20905] >> > > -->/usr/lib64/glusterfs/3.10.5/xlator/protocol/server.so(+0x3006b) >> > > [0x7f2cfa4bf06b] >> > > -->/usr/lib64/glusterfs/3.10.5/xlator/protocol/server.so(+0xdb34) >> > > [0x7f2cfa49cb34] ) 0-server: invalid argument: client [Invalid >> argument] >> > > >> > > Do I need to upgrade all nodes before I attempt to start the gluster >> > > services? >> > > I reverted the first node back to 3.8 at the moment and all restored. >> > > Also tests with eager lock disabled did not make any difference. >> > > >> > > >> > > >> > > >> > > On Wed, Sep 6, 2017 at 11:15 AM, Krutika Dhananjay < >> kdhananj at redhat.com > >> > > wrote: >> > > >> > > >> > > >> > > Do you see any improvement with 3.11.1 as that has a patch that >> improves >> > perf >> > > for this kind of a workload >> > > >> > > Also, could you disable eager-lock and check if that helps? I see >> that max >> > > time is being spent in acquiring locks. >> > > >> > > -Krutika >> > > >> > > On Wed, Sep 6, 2017 at 1:38 PM, Abi Askushi < rightkicktech at gmail.com >> > >> > > wrote: >> > > >> > > >> > > >> > > Hi Krutika, >> > > >> > > Is it anything in the profile indicating what is causing this >> bottleneck? >> > In >> > > case i can collect any other info let me know. >> > > >> > > Thanx >> > > >> > > On Sep 5, 2017 13:27, "Abi Askushi" < rightkicktech at gmail.com > >> wrote: >> > > >> > > >> > > >> > > Hi Krutika, >> > > >> > > Attached the profile stats. I enabled profiling then ran some dd >> tests. >> > Also >> > > 3 Windows VMs are running on top this volume but did not do any stress >> > > testing on the VMs. I have left the profiling enabled in case more >> time is >> > > needed for useful stats. >> > > >> > > Thanx >> > > >> > > On Tue, Sep 5, 2017 at 12:48 PM, Krutika Dhananjay < >> kdhananj at redhat.com > >> > > wrote: >> > > >> > > >> > > >> > > OK my understanding is that with preallocated disks the performance >> with >> > and >> > > without shard will be the same. >> > > >> > > In any case, please attach the volume profile[1], so we can see what >> else >> > is >> > > slowing things down. >> > > >> > > -Krutika >> > > >> > > [1] - >> > > https://gluster.readthedocs.io/en/latest/Administrator% >> > 20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command >> > > >> > > On Tue, Sep 5, 2017 at 2:32 PM, Abi Askushi < rightkicktech at gmail.com >> > >> > > wrote: >> > > >> > > >> > > >> > > Hi Krutika, >> > > >> > > I already have a preallocated disk on VM. >> > > Now I am checking performance with dd on the hypervisors which have >> the >> > > gluster volume configured. >> > > >> > > I tried also several values of shard-block-size and I keep getting the >> > same >> > > low values on write performance. >> > > Enabling client-io-threads also did not have any affect. >> > > >> > > The version of gluster I am using is glusterfs 3.8.12 built on May 11 >> 2017 >> > > 18:46:20. >> > > The setup is a set of 3 Centos 7.3 servers and ovirt 4.1, using >> gluster as >> > > storage. >> > > >> > > Below are the current settings: >> > > >> > > >> > > Volume Name: vms >> > > Type: Replicate >> > > Volume ID: 4513340d-7919-498b-bfe0-d836b5cea40b >> > > Status: Started >> > > Snapshot Count: 0 >> > > Number of Bricks: 1 x (2 + 1) = 3 >> > > Transport-type: tcp >> > > Bricks: >> > > Brick1: gluster0:/gluster/vms/brick >> > > Brick2: gluster1:/gluster/vms/brick >> > > Brick3: gluster2:/gluster/vms/brick (arbiter) >> > > Options Reconfigured: >> > > server.event-threads: 4 >> > > client.event-threads: 4 >> > > performance.client-io-threads: on >> > > features.shard-block-size: 512MB >> > > cluster.granular-entry-heal: enable >> > > performance.strict-o-direct: on >> > > network.ping-timeout: 30 >> > > storage.owner-gid: 36 >> > > storage.owner-uid: 36 >> > > user.cifs: off >> > > features.shard: on >> > > cluster.shd-wait-qlength: 10000 >> > > cluster.shd-max-threads: 8 >> > > cluster.locking-scheme: granular >> > > cluster.data-self-heal-algorithm: full >> > > cluster.server-quorum-type: server >> > > cluster.quorum-type: auto >> > > cluster.eager-lock: enable >> > > network.remote-dio: off >> > > performance.low-prio-threads: 32 >> > > performance.stat-prefetch: on >> > > performance.io-cache: off >> > > performance.read-ahead: off >> > > performance.quick-read: off >> > > transport.address-family: inet >> > > performance.readdir-ahead: on >> > > nfs.disable: on >> > > nfs.export-volumes: on >> > > >> > > >> > > I observed that when testing with dd if=/dev/zero of=testfile bs=1G >> > count=1 I >> > > get 65MB/s on the vms gluster volume (and the network traffic between >> the >> > > servers reaches ~ 500Mbps), while when testing with dd if=/dev/zero >> > > of=testfile bs=1G count=1 oflag=direct I get a consistent 10MB/s and >> the >> > > network traffic hardly reaching 100Mbps. >> > > >> > > Any other things one can do? >> > > >> > > On Tue, Sep 5, 2017 at 5:57 AM, Krutika Dhananjay < >> kdhananj at redhat.com > >> > > wrote: >> > > >> > > >> > > >> > > I'm assuming you are using this volume to store vm images, because I >> see >> > > shard in the options list. >> > > >> > > Speaking from shard translator's POV, one thing you can do to improve >> > > performance is to use preallocated images. >> > > This will at least eliminate the need for shard to perform multiple >> steps >> > as >> > > part of the writes - such as creating the shard and then writing to >> it and >> > > then updating the aggregated file size - all of which require one >> network >> > > call each, which further get blown up once they reach AFR (replicate) >> into >> > > many more network calls. >> > > >> > > Second, I'm assuming you're using the default shard block size of 4MB >> (you >> > > can confirm this using `gluster volume get <VOL> shard-block-size`). >> In >> > our >> > > tests, we've found that larger shard sizes perform better. So maybe >> change >> > > the shard-block-size to 64MB (`gluster volume set <VOL> >> shard-block-size >> > > 64MB`). >> > > >> > > Third, keep stat-prefetch enabled. We've found that qemu sends quite a >> > lot of >> > > [f]stats which can be served from the (md)cache to improve >> performance. So >> > > enable that. >> > > >> > > Also, could you also enable client-io-threads and see if that improves >> > > performance? >> > > >> > > Which version of gluster are you using BTW? >> > > >> > > -Krutika >> > > >> > > >> > > On Tue, Sep 5, 2017 at 4:32 AM, Abi Askushi < rightkicktech at gmail.com >> > >> > > wrote: >> > > >> > > >> > > >> > > Hi all, >> > > >> > > I have a gluster volume used to host several VMs (managed through >> oVirt). >> > > The volume is a replica 3 with arbiter and the 3 servers use 1 Gbit >> > network >> > > for the storage. >> > > >> > > When testing with dd (dd if=/dev/zero of=testfile bs=1G count=1 >> > oflag=direct) >> > > out of the volume (e.g. writing at /root/) the performance of the dd >> is >> > > reported to be ~ 700MB/s, which is quite decent. When testing the dd >> on >> > the >> > > gluster volume I get ~ 43 MB/s which way lower from the previous. When >> > > testing with dd the gluster volume, the network traffic was not >> exceeding >> > > 450 Mbps on the network interface. I would expect to reach near 900 >> Mbps >> > > considering that there is 1 Gbit of bandwidth available. This results >> > having >> > > VMs with very slow performance (especially on their write operations). >> > > >> > > The full details of the volume are below. Any advise on what can be >> > tweaked >> > > will be highly appreciated. >> > > >> > > Volume Name: vms >> > > Type: Replicate >> > > Volume ID: 4513340d-7919-498b-bfe0-d836b5cea40b >> > > Status: Started >> > > Snapshot Count: 0 >> > > Number of Bricks: 1 x (2 + 1) = 3 >> > > Transport-type: tcp >> > > Bricks: >> > > Brick1: gluster0:/gluster/vms/brick >> > > Brick2: gluster1:/gluster/vms/brick >> > > Brick3: gluster2:/gluster/vms/brick (arbiter) >> > > Options Reconfigured: >> > > cluster.granular-entry-heal: enable >> > > performance.strict-o-direct: on >> > > network.ping-timeout: 30 >> > > storage.owner-gid: 36 >> > > storage.owner-uid: 36 >> > > user.cifs: off >> > > features.shard: on >> > > cluster.shd-wait-qlength: 10000 >> > > cluster.shd-max-threads: 8 >> > > cluster.locking-scheme: granular >> > > cluster.data-self-heal-algorithm: full >> > > cluster.server-quorum-type: server >> > > cluster.quorum-type: auto >> > > cluster.eager-lock: enable >> > > network.remote-dio: off >> > > performance.low-prio-threads: 32 >> > > performance.stat-prefetch: off >> > > performance.io-cache: off >> > > performance.read-ahead: off >> > > performance.quick-read: off >> > > transport.address-family: inet >> > > performance.readdir-ahead: on >> > > nfs.disable: on >> > > nfs.export-volumes: on >> > > >> > > >> > > Thanx, >> > > Alex >> > > >> > > _______________________________________________ >> > > Gluster-users mailing list >> > > Gluster-users at gluster.org >> > > http://lists.gluster.org/mailman/listinfo/gluster-users >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > _______________________________________________ >> > > Gluster-users mailing list >> > > Gluster-users at gluster.org >> > > http://lists.gluster.org/mailman/listinfo/gluster-users >> > >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170912/64a21966/attachment.html>