thr3ads.net - Gluster users - [Gluster-users] Slow performance of gluster volume [Sep 2017]

If this information is useful, please help other people find it:
Share via:

Ben Turner

2017-Sep-11 13:55 UTC

[Gluster-users] Slow performance of gluster volume

----- Original Message -----> From: "Abi Askushi" <rightkicktech at gmail.com>
> To: "Ben Turner" <bturner at redhat.com>
> Cc: "Krutika Dhananjay" <kdhananj at redhat.com>,
"gluster-user" <gluster-users at gluster.org>
> Sent: Monday, September 11, 2017 1:40:42 AM
> Subject: Re: [Gluster-users] Slow performance of gluster volume
> 
> Did not upgrade yet gluster. I am still  using 3.8.12. Only the mentioned
> changes did provide the performance boost.
> 
> From which version to which version did you see such performance boost? I
> will try to upgrade and check difference also.
Unfortunately I didn't record the package versions, I also may have done the
same thing as you :)

-b 
> 
> On Sep 11, 2017 2:45 AM, "Ben Turner" <bturner at
redhat.com> wrote:
> 
> Great to hear!
> 
> ----- Original Message -----
> > From: "Abi Askushi" <rightkicktech at gmail.com>
> > To: "Krutika Dhananjay" <kdhananj at redhat.com>
> > Cc: "gluster-user" <gluster-users at gluster.org>
> > Sent: Friday, September 8, 2017 7:01:00 PM
> > Subject: Re: [Gluster-users] Slow performance of gluster volume
> >
> > Following changes resolved the perf issue:
> >
> > Added the option
> > /etc/glusterfs/glusterd.vol :
> > option rpc-auth-allow-insecure on
> 
> Was it this setting or was it the gluster upgrade, do you know for sure?
> It may be helpful to others to know for sure(Im interested too:).
> 
> -b
> 
> >
> > restarted glusterd
> >
> > Then set the volume option:
> > gluster volume set vms server.allow-insecure on
> >
> > I am reaching now the max network bandwidth and performance of VMs is
> quite
> > good.
> >
> > Did not upgrade the glusterd.
> >
> > As a next try I am thinking to upgrade gluster to 3.12 + test libgfapi
> > integration of qemu by upgrading to ovirt 4.1.5 and check vm perf.
> >
> >
> > On Sep 6, 2017 1:20 PM, "Abi Askushi" < rightkicktech at
gmail.com > wrote:
> >
> >
> >
> > I tried to follow step from
> > https://wiki.centos.org/SpecialInterestGroup/Storage to install latest
> > gluster on the first node.
> > It installed 3.10 and not 3.11. I am not sure how to install 3.11
without
> > compiling it.
> > Then when tried to start the gluster on the node the bricks were
reported
> > down (the other 2 nodes have still 3.8). No sure why. The logs were
> showing
> > the below (even after rebooting the server):
> >
> > [2017-09-06 10:56:09.023777] E
[rpcsvc.c:557:rpcsvc_check_and_reply_error]
> > 0-rpcsvc: rpc actor failed to complete successfully
> > [2017-09-06 10:56:09.024122] E
[server-helpers.c:395:server_alloc_frame]
> > (-->/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x325)
[0x7f2d0ec20905]
> > -->/usr/lib64/glusterfs/3.10.5/xlator/protocol/server.so(+0x3006b)
> > [0x7f2cfa4bf06b]
> > -->/usr/lib64/glusterfs/3.10.5/xlator/protocol/server.so(+0xdb34)
> > [0x7f2cfa49cb34] ) 0-server: invalid argument: client [Invalid
argument]
> >
> > Do I need to upgrade all nodes before I attempt to start the gluster
> > services?
> > I reverted the first node back to 3.8 at the moment and all restored.
> > Also tests with eager lock disabled did not make any difference.
> >
> >
> >
> >
> > On Wed, Sep 6, 2017 at 11:15 AM, Krutika Dhananjay < kdhananj at
redhat.com >
> > wrote:
> >
> >
> >
> > Do you see any improvement with 3.11.1 as that has a patch that
improves
> perf
> > for this kind of a workload
> >
> > Also, could you disable eager-lock and check if that helps? I see that
max
> > time is being spent in acquiring locks.
> >
> > -Krutika
> >
> > On Wed, Sep 6, 2017 at 1:38 PM, Abi Askushi < rightkicktech at
gmail.com >
> > wrote:
> >
> >
> >
> > Hi Krutika,
> >
> > Is it anything in the profile indicating what is causing this
bottleneck?
> In
> > case i can collect any other info let me know.
> >
> > Thanx
> >
> > On Sep 5, 2017 13:27, "Abi Askushi" < rightkicktech at
gmail.com > wrote:
> >
> >
> >
> > Hi Krutika,
> >
> > Attached the profile stats. I enabled profiling then ran some dd
tests.
> Also
> > 3 Windows VMs are running on top this volume but did not do any stress
> > testing on the VMs. I have left the profiling enabled in case more
time is
> > needed for useful stats.
> >
> > Thanx
> >
> > On Tue, Sep 5, 2017 at 12:48 PM, Krutika Dhananjay < kdhananj at
redhat.com >
> > wrote:
> >
> >
> >
> > OK my understanding is that with preallocated disks the performance
with
> and
> > without shard will be the same.
> >
> > In any case, please attach the volume profile[1], so we can see what
else
> is
> > slowing things down.
> >
> > -Krutika
> >
> > [1] -
> > https://gluster.readthedocs.io/en/latest/Administrator%
> 20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command
> >
> > On Tue, Sep 5, 2017 at 2:32 PM, Abi Askushi < rightkicktech at
gmail.com >
> > wrote:
> >
> >
> >
> > Hi Krutika,
> >
> > I already have a preallocated disk on VM.
> > Now I am checking performance with dd on the hypervisors which have
the
> > gluster volume configured.
> >
> > I tried also several values of shard-block-size and I keep getting the
> same
> > low values on write performance.
> > Enabling client-io-threads also did not have any affect.
> >
> > The version of gluster I am using is glusterfs 3.8.12 built on May 11
2017
> > 18:46:20.
> > The setup is a set of 3 Centos 7.3 servers and ovirt 4.1, using
gluster as
> > storage.
> >
> > Below are the current settings:
> >
> >
> > Volume Name: vms
> > Type: Replicate
> > Volume ID: 4513340d-7919-498b-bfe0-d836b5cea40b
> > Status: Started
> > Snapshot Count: 0
> > Number of Bricks: 1 x (2 + 1) = 3
> > Transport-type: tcp
> > Bricks:
> > Brick1: gluster0:/gluster/vms/brick
> > Brick2: gluster1:/gluster/vms/brick
> > Brick3: gluster2:/gluster/vms/brick (arbiter)
> > Options Reconfigured:
> > server.event-threads: 4
> > client.event-threads: 4
> > performance.client-io-threads: on
> > features.shard-block-size: 512MB
> > cluster.granular-entry-heal: enable
> > performance.strict-o-direct: on
> > network.ping-timeout: 30
> > storage.owner-gid: 36
> > storage.owner-uid: 36
> > user.cifs: off
> > features.shard: on
> > cluster.shd-wait-qlength: 10000
> > cluster.shd-max-threads: 8
> > cluster.locking-scheme: granular
> > cluster.data-self-heal-algorithm: full
> > cluster.server-quorum-type: server
> > cluster.quorum-type: auto
> > cluster.eager-lock: enable
> > network.remote-dio: off
> > performance.low-prio-threads: 32
> > performance.stat-prefetch: on
> > performance.io-cache: off
> > performance.read-ahead: off
> > performance.quick-read: off
> > transport.address-family: inet
> > performance.readdir-ahead: on
> > nfs.disable: on
> > nfs.export-volumes: on
> >
> >
> > I observed that when testing with dd if=/dev/zero of=testfile bs=1G
> count=1 I
> > get 65MB/s on the vms gluster volume (and the network traffic between
the
> > servers reaches ~ 500Mbps), while when testing with dd if=/dev/zero
> > of=testfile bs=1G count=1 oflag=direct I get a consistent 10MB/s and
the
> > network traffic hardly reaching 100Mbps.
> >
> > Any other things one can do?
> >
> > On Tue, Sep 5, 2017 at 5:57 AM, Krutika Dhananjay < kdhananj at
redhat.com >
> > wrote:
> >
> >
> >
> > I'm assuming you are using this volume to store vm images, because
I see
> > shard in the options list.
> >
> > Speaking from shard translator's POV, one thing you can do to
improve
> > performance is to use preallocated images.
> > This will at least eliminate the need for shard to perform multiple
steps
> as
> > part of the writes - such as creating the shard and then writing to it
and
> > then updating the aggregated file size - all of which require one
network
> > call each, which further get blown up once they reach AFR (replicate)
into
> > many more network calls.
> >
> > Second, I'm assuming you're using the default shard block size
of 4MB (you
> > can confirm this using `gluster volume get <VOL>
shard-block-size`). In
> our
> > tests, we've found that larger shard sizes perform better. So
maybe change
> > the shard-block-size to 64MB (`gluster volume set <VOL>
shard-block-size
> > 64MB`).
> >
> > Third, keep stat-prefetch enabled. We've found that qemu sends
quite a
> lot of
> > [f]stats which can be served from the (md)cache to improve
performance. So
> > enable that.
> >
> > Also, could you also enable client-io-threads and see if that improves
> > performance?
> >
> > Which version of gluster are you using BTW?
> >
> > -Krutika
> >
> >
> > On Tue, Sep 5, 2017 at 4:32 AM, Abi Askushi < rightkicktech at
gmail.com >
> > wrote:
> >
> >
> >
> > Hi all,
> >
> > I have a gluster volume used to host several VMs (managed through
oVirt).
> > The volume is a replica 3 with arbiter and the 3 servers use 1 Gbit
> network
> > for the storage.
> >
> > When testing with dd (dd if=/dev/zero of=testfile bs=1G count=1
> oflag=direct)
> > out of the volume (e.g. writing at /root/) the performance of the dd
is
> > reported to be ~ 700MB/s, which is quite decent. When testing the dd
on
> the
> > gluster volume I get ~ 43 MB/s which way lower from the previous. When
> > testing with dd the gluster volume, the network traffic was not
exceeding
> > 450 Mbps on the network interface. I would expect to reach near 900
Mbps
> > considering that there is 1 Gbit of bandwidth available. This results
> having
> > VMs with very slow performance (especially on their write operations).
> >
> > The full details of the volume are below. Any advise on what can be
> tweaked
> > will be highly appreciated.
> >
> > Volume Name: vms
> > Type: Replicate
> > Volume ID: 4513340d-7919-498b-bfe0-d836b5cea40b
> > Status: Started
> > Snapshot Count: 0
> > Number of Bricks: 1 x (2 + 1) = 3
> > Transport-type: tcp
> > Bricks:
> > Brick1: gluster0:/gluster/vms/brick
> > Brick2: gluster1:/gluster/vms/brick
> > Brick3: gluster2:/gluster/vms/brick (arbiter)
> > Options Reconfigured:
> > cluster.granular-entry-heal: enable
> > performance.strict-o-direct: on
> > network.ping-timeout: 30
> > storage.owner-gid: 36
> > storage.owner-uid: 36
> > user.cifs: off
> > features.shard: on
> > cluster.shd-wait-qlength: 10000
> > cluster.shd-max-threads: 8
> > cluster.locking-scheme: granular
> > cluster.data-self-heal-algorithm: full
> > cluster.server-quorum-type: server
> > cluster.quorum-type: auto
> > cluster.eager-lock: enable
> > network.remote-dio: off
> > performance.low-prio-threads: 32
> > performance.stat-prefetch: off
> > performance.io-cache: off
> > performance.read-ahead: off
> > performance.quick-read: off
> > transport.address-family: inet
> > performance.readdir-ahead: on
> > nfs.disable: on
> > nfs.export-volumes: on
> >
> >
> > Thanx,
> > Alex
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-users
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-users
>

Dmitri Chebotarov

2017-Sep-11 16:27 UTC

head link

[Gluster-users] Slow performance of gluster volume

Hi Abi

Can you please share your current transfer speeds after you made the change?

Thank you.

On Mon, Sep 11, 2017 at 9:55 AM, Ben Turner <bturner at redhat.com> wrote:
> ----- Original Message -----
> > From: "Abi Askushi" <rightkicktech at gmail.com>
> > To: "Ben Turner" <bturner at redhat.com>
> > Cc: "Krutika Dhananjay" <kdhananj at redhat.com>,
"gluster-user" <
> gluster-users at gluster.org>
> > Sent: Monday, September 11, 2017 1:40:42 AM
> > Subject: Re: [Gluster-users] Slow performance of gluster volume
> >
> > Did not upgrade yet gluster. I am still  using 3.8.12. Only the
mentioned
> > changes did provide the performance boost.
> >
> > From which version to which version did you see such performance
boost? I
> > will try to upgrade and check difference also.
>
> Unfortunately I didn't record the package versions, I also may have
done
> the same thing as you :)
>
> -b
>
> >
> > On Sep 11, 2017 2:45 AM, "Ben Turner" <bturner at
redhat.com> wrote:
> >
> > Great to hear!
> >
> > ----- Original Message -----
> > > From: "Abi Askushi" <rightkicktech at gmail.com>
> > > To: "Krutika Dhananjay" <kdhananj at redhat.com>
> > > Cc: "gluster-user" <gluster-users at gluster.org>
> > > Sent: Friday, September 8, 2017 7:01:00 PM
> > > Subject: Re: [Gluster-users] Slow performance of gluster volume
> > >
> > > Following changes resolved the perf issue:
> > >
> > > Added the option
> > > /etc/glusterfs/glusterd.vol :
> > > option rpc-auth-allow-insecure on
> >
> > Was it this setting or was it the gluster upgrade, do you know for
sure?
> > It may be helpful to others to know for sure(Im interested too:).
> >
> > -b
> >
> > >
> > > restarted glusterd
> > >
> > > Then set the volume option:
> > > gluster volume set vms server.allow-insecure on
> > >
> > > I am reaching now the max network bandwidth and performance of
VMs is
> > quite
> > > good.
> > >
> > > Did not upgrade the glusterd.
> > >
> > > As a next try I am thinking to upgrade gluster to 3.12 + test
libgfapi
> > > integration of qemu by upgrading to ovirt 4.1.5 and check vm
perf.
> > >
> > >
> > > On Sep 6, 2017 1:20 PM, "Abi Askushi" <
rightkicktech at gmail.com >
> wrote:
> > >
> > >
> > >
> > > I tried to follow step from
> > > https://wiki.centos.org/SpecialInterestGroup/Storage to install
latest
> > > gluster on the first node.
> > > It installed 3.10 and not 3.11. I am not sure how to install 3.11
> without
> > > compiling it.
> > > Then when tried to start the gluster on the node the bricks were
> reported
> > > down (the other 2 nodes have still 3.8). No sure why. The logs
were
> > showing
> > > the below (even after rebooting the server):
> > >
> > > [2017-09-06 10:56:09.023777] E [rpcsvc.c:557:rpcsvc_check_
> and_reply_error]
> > > 0-rpcsvc: rpc actor failed to complete successfully
> > > [2017-09-06 10:56:09.024122] E [server-helpers.c:395:server_
> alloc_frame]
> > > (-->/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x325)
> [0x7f2d0ec20905]
> > >
-->/usr/lib64/glusterfs/3.10.5/xlator/protocol/server.so(+0x3006b)
> > > [0x7f2cfa4bf06b]
> > >
-->/usr/lib64/glusterfs/3.10.5/xlator/protocol/server.so(+0xdb34)
> > > [0x7f2cfa49cb34] ) 0-server: invalid argument: client [Invalid
> argument]
> > >
> > > Do I need to upgrade all nodes before I attempt to start the
gluster
> > > services?
> > > I reverted the first node back to 3.8 at the moment and all
restored.
> > > Also tests with eager lock disabled did not make any difference.
> > >
> > >
> > >
> > >
> > > On Wed, Sep 6, 2017 at 11:15 AM, Krutika Dhananjay <
> kdhananj at redhat.com >
> > > wrote:
> > >
> > >
> > >
> > > Do you see any improvement with 3.11.1 as that has a patch that
> improves
> > perf
> > > for this kind of a workload
> > >
> > > Also, could you disable eager-lock and check if that helps? I see
that
> max
> > > time is being spent in acquiring locks.
> > >
> > > -Krutika
> > >
> > > On Wed, Sep 6, 2017 at 1:38 PM, Abi Askushi < rightkicktech at
gmail.com
> >
> > > wrote:
> > >
> > >
> > >
> > > Hi Krutika,
> > >
> > > Is it anything in the profile indicating what is causing this
> bottleneck?
> > In
> > > case i can collect any other info let me know.
> > >
> > > Thanx
> > >
> > > On Sep 5, 2017 13:27, "Abi Askushi" < rightkicktech
at gmail.com > wrote:
> > >
> > >
> > >
> > > Hi Krutika,
> > >
> > > Attached the profile stats. I enabled profiling then ran some dd
tests.
> > Also
> > > 3 Windows VMs are running on top this volume but did not do any
stress
> > > testing on the VMs. I have left the profiling enabled in case
more
> time is
> > > needed for useful stats.
> > >
> > > Thanx
> > >
> > > On Tue, Sep 5, 2017 at 12:48 PM, Krutika Dhananjay <
> kdhananj at redhat.com >
> > > wrote:
> > >
> > >
> > >
> > > OK my understanding is that with preallocated disks the
performance
> with
> > and
> > > without shard will be the same.
> > >
> > > In any case, please attach the volume profile[1], so we can see
what
> else
> > is
> > > slowing things down.
> > >
> > > -Krutika
> > >
> > > [1] -
> > > https://gluster.readthedocs.io/en/latest/Administrator%
> >
20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command
> > >
> > > On Tue, Sep 5, 2017 at 2:32 PM, Abi Askushi < rightkicktech at
gmail.com
> >
> > > wrote:
> > >
> > >
> > >
> > > Hi Krutika,
> > >
> > > I already have a preallocated disk on VM.
> > > Now I am checking performance with dd on the hypervisors which
have the
> > > gluster volume configured.
> > >
> > > I tried also several values of shard-block-size and I keep
getting the
> > same
> > > low values on write performance.
> > > Enabling client-io-threads also did not have any affect.
> > >
> > > The version of gluster I am using is glusterfs 3.8.12 built on
May 11
> 2017
> > > 18:46:20.
> > > The setup is a set of 3 Centos 7.3 servers and ovirt 4.1, using
> gluster as
> > > storage.
> > >
> > > Below are the current settings:
> > >
> > >
> > > Volume Name: vms
> > > Type: Replicate
> > > Volume ID: 4513340d-7919-498b-bfe0-d836b5cea40b
> > > Status: Started
> > > Snapshot Count: 0
> > > Number of Bricks: 1 x (2 + 1) = 3
> > > Transport-type: tcp
> > > Bricks:
> > > Brick1: gluster0:/gluster/vms/brick
> > > Brick2: gluster1:/gluster/vms/brick
> > > Brick3: gluster2:/gluster/vms/brick (arbiter)
> > > Options Reconfigured:
> > > server.event-threads: 4
> > > client.event-threads: 4
> > > performance.client-io-threads: on
> > > features.shard-block-size: 512MB
> > > cluster.granular-entry-heal: enable
> > > performance.strict-o-direct: on
> > > network.ping-timeout: 30
> > > storage.owner-gid: 36
> > > storage.owner-uid: 36
> > > user.cifs: off
> > > features.shard: on
> > > cluster.shd-wait-qlength: 10000
> > > cluster.shd-max-threads: 8
> > > cluster.locking-scheme: granular
> > > cluster.data-self-heal-algorithm: full
> > > cluster.server-quorum-type: server
> > > cluster.quorum-type: auto
> > > cluster.eager-lock: enable
> > > network.remote-dio: off
> > > performance.low-prio-threads: 32
> > > performance.stat-prefetch: on
> > > performance.io-cache: off
> > > performance.read-ahead: off
> > > performance.quick-read: off
> > > transport.address-family: inet
> > > performance.readdir-ahead: on
> > > nfs.disable: on
> > > nfs.export-volumes: on
> > >
> > >
> > > I observed that when testing with dd if=/dev/zero of=testfile
bs=1G
> > count=1 I
> > > get 65MB/s on the vms gluster volume (and the network traffic
between
> the
> > > servers reaches ~ 500Mbps), while when testing with dd
if=/dev/zero
> > > of=testfile bs=1G count=1 oflag=direct I get a consistent 10MB/s
and
> the
> > > network traffic hardly reaching 100Mbps.
> > >
> > > Any other things one can do?
> > >
> > > On Tue, Sep 5, 2017 at 5:57 AM, Krutika Dhananjay <
> kdhananj at redhat.com >
> > > wrote:
> > >
> > >
> > >
> > > I'm assuming you are using this volume to store vm images,
because I
> see
> > > shard in the options list.
> > >
> > > Speaking from shard translator's POV, one thing you can do to
improve
> > > performance is to use preallocated images.
> > > This will at least eliminate the need for shard to perform
multiple
> steps
> > as
> > > part of the writes - such as creating the shard and then writing
to it
> and
> > > then updating the aggregated file size - all of which require one
> network
> > > call each, which further get blown up once they reach AFR
(replicate)
> into
> > > many more network calls.
> > >
> > > Second, I'm assuming you're using the default shard block
size of 4MB
> (you
> > > can confirm this using `gluster volume get <VOL>
shard-block-size`). In
> > our
> > > tests, we've found that larger shard sizes perform better. So
maybe
> change
> > > the shard-block-size to 64MB (`gluster volume set <VOL>
> shard-block-size
> > > 64MB`).
> > >
> > > Third, keep stat-prefetch enabled. We've found that qemu
sends quite a
> > lot of
> > > [f]stats which can be served from the (md)cache to improve
> performance. So
> > > enable that.
> > >
> > > Also, could you also enable client-io-threads and see if that
improves
> > > performance?
> > >
> > > Which version of gluster are you using BTW?
> > >
> > > -Krutika
> > >
> > >
> > > On Tue, Sep 5, 2017 at 4:32 AM, Abi Askushi < rightkicktech at
gmail.com
> >
> > > wrote:
> > >
> > >
> > >
> > > Hi all,
> > >
> > > I have a gluster volume used to host several VMs (managed through
> oVirt).
> > > The volume is a replica 3 with arbiter and the 3 servers use 1
Gbit
> > network
> > > for the storage.
> > >
> > > When testing with dd (dd if=/dev/zero of=testfile bs=1G count=1
> > oflag=direct)
> > > out of the volume (e.g. writing at /root/) the performance of the
dd is
> > > reported to be ~ 700MB/s, which is quite decent. When testing the
dd on
> > the
> > > gluster volume I get ~ 43 MB/s which way lower from the previous.
When
> > > testing with dd the gluster volume, the network traffic was not
> exceeding
> > > 450 Mbps on the network interface. I would expect to reach near
900
> Mbps
> > > considering that there is 1 Gbit of bandwidth available. This
results
> > having
> > > VMs with very slow performance (especially on their write
operations).
> > >
> > > The full details of the volume are below. Any advise on what can
be
> > tweaked
> > > will be highly appreciated.
> > >
> > > Volume Name: vms
> > > Type: Replicate
> > > Volume ID: 4513340d-7919-498b-bfe0-d836b5cea40b
> > > Status: Started
> > > Snapshot Count: 0
> > > Number of Bricks: 1 x (2 + 1) = 3
> > > Transport-type: tcp
> > > Bricks:
> > > Brick1: gluster0:/gluster/vms/brick
> > > Brick2: gluster1:/gluster/vms/brick
> > > Brick3: gluster2:/gluster/vms/brick (arbiter)
> > > Options Reconfigured:
> > > cluster.granular-entry-heal: enable
> > > performance.strict-o-direct: on
> > > network.ping-timeout: 30
> > > storage.owner-gid: 36
> > > storage.owner-uid: 36
> > > user.cifs: off
> > > features.shard: on
> > > cluster.shd-wait-qlength: 10000
> > > cluster.shd-max-threads: 8
> > > cluster.locking-scheme: granular
> > > cluster.data-self-heal-algorithm: full
> > > cluster.server-quorum-type: server
> > > cluster.quorum-type: auto
> > > cluster.eager-lock: enable
> > > network.remote-dio: off
> > > performance.low-prio-threads: 32
> > > performance.stat-prefetch: off
> > > performance.io-cache: off
> > > performance.read-ahead: off
> > > performance.quick-read: off
> > > transport.address-family: inet
> > > performance.readdir-ahead: on
> > > nfs.disable: on
> > > nfs.export-volumes: on
> > >
> > >
> > > Thanx,
> > > Alex
> > >
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > http://lists.gluster.org/mailman/listinfo/gluster-users
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > http://lists.gluster.org/mailman/listinfo/gluster-users
> >
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170911/60c9d406/attachment.html>

Abi Askushi

2017-Sep-12 07:27 UTC

head link

[Gluster-users] Slow performance of gluster volume

Hi Dmitri,

I was getting 8 - 10 MB/s on the gluster mount before the changes, which
was really slow and the VMs where sluggish and impractical to use.
After the changes, I am getting 70 MB/s which is the expected considering I
am using 1 Gbit network for the storage.
The VMs also became way more responsive and are giving a normal user
experience.

For a quick test on the mount point on the host I am using:
dd if=/dev/zero of=testfile bs=1M count=1024 oflag=direct


On Mon, Sep 11, 2017 at 7:27 PM, Dmitri Chebotarov <4dimach at gmail.com>
wrote:
> Hi Abi
>
> Can you please share your current transfer speeds after you made the
> change?
>
> Thank you.
>
> On Mon, Sep 11, 2017 at 9:55 AM, Ben Turner <bturner at redhat.com>
wrote:
>
>> ----- Original Message -----
>> > From: "Abi Askushi" <rightkicktech at gmail.com>
>> > To: "Ben Turner" <bturner at redhat.com>
>> > Cc: "Krutika Dhananjay" <kdhananj at redhat.com>,
"gluster-user" <
>> gluster-users at gluster.org>
>> > Sent: Monday, September 11, 2017 1:40:42 AM
>> > Subject: Re: [Gluster-users] Slow performance of gluster volume
>> >
>> > Did not upgrade yet gluster. I am still  using 3.8.12. Only the
>> mentioned
>> > changes did provide the performance boost.
>> >
>> > From which version to which version did you see such performance
boost?
>> I
>> > will try to upgrade and check difference also.
>>
>> Unfortunately I didn't record the package versions, I also may have
done
>> the same thing as you :)
>>
>> -b
>>
>> >
>> > On Sep 11, 2017 2:45 AM, "Ben Turner" <bturner at
redhat.com> wrote:
>> >
>> > Great to hear!
>> >
>> > ----- Original Message -----
>> > > From: "Abi Askushi" <rightkicktech at
gmail.com>
>> > > To: "Krutika Dhananjay" <kdhananj at
redhat.com>
>> > > Cc: "gluster-user" <gluster-users at
gluster.org>
>> > > Sent: Friday, September 8, 2017 7:01:00 PM
>> > > Subject: Re: [Gluster-users] Slow performance of gluster
volume
>> > >
>> > > Following changes resolved the perf issue:
>> > >
>> > > Added the option
>> > > /etc/glusterfs/glusterd.vol :
>> > > option rpc-auth-allow-insecure on
>> >
>> > Was it this setting or was it the gluster upgrade, do you know for
sure?
>> > It may be helpful to others to know for sure(Im interested too:).
>> >
>> > -b
>> >
>> > >
>> > > restarted glusterd
>> > >
>> > > Then set the volume option:
>> > > gluster volume set vms server.allow-insecure on
>> > >
>> > > I am reaching now the max network bandwidth and performance
of VMs is
>> > quite
>> > > good.
>> > >
>> > > Did not upgrade the glusterd.
>> > >
>> > > As a next try I am thinking to upgrade gluster to 3.12 + test
libgfapi
>> > > integration of qemu by upgrading to ovirt 4.1.5 and check vm
perf.
>> > >
>> > >
>> > > On Sep 6, 2017 1:20 PM, "Abi Askushi" <
rightkicktech at gmail.com >
>> wrote:
>> > >
>> > >
>> > >
>> > > I tried to follow step from
>> > > https://wiki.centos.org/SpecialInterestGroup/Storage to
install
>> latest
>> > > gluster on the first node.
>> > > It installed 3.10 and not 3.11. I am not sure how to install
3.11
>> without
>> > > compiling it.
>> > > Then when tried to start the gluster on the node the bricks
were
>> reported
>> > > down (the other 2 nodes have still 3.8). No sure why. The
logs were
>> > showing
>> > > the below (even after rebooting the server):
>> > >
>> > > [2017-09-06 10:56:09.023777] E [rpcsvc.c:557:rpcsvc_check_and
>> _reply_error]
>> > > 0-rpcsvc: rpc actor failed to complete successfully
>> > > [2017-09-06 10:56:09.024122] E [server-helpers.c:395:server_a
>> lloc_frame]
>> > > (-->/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x325)
>> [0x7f2d0ec20905]
>> > >
-->/usr/lib64/glusterfs/3.10.5/xlator/protocol/server.so(+0x3006b)
>> > > [0x7f2cfa4bf06b]
>> > >
-->/usr/lib64/glusterfs/3.10.5/xlator/protocol/server.so(+0xdb34)
>> > > [0x7f2cfa49cb34] ) 0-server: invalid argument: client
[Invalid
>> argument]
>> > >
>> > > Do I need to upgrade all nodes before I attempt to start the
gluster
>> > > services?
>> > > I reverted the first node back to 3.8 at the moment and all
restored.
>> > > Also tests with eager lock disabled did not make any
difference.
>> > >
>> > >
>> > >
>> > >
>> > > On Wed, Sep 6, 2017 at 11:15 AM, Krutika Dhananjay <
>> kdhananj at redhat.com >
>> > > wrote:
>> > >
>> > >
>> > >
>> > > Do you see any improvement with 3.11.1 as that has a patch
that
>> improves
>> > perf
>> > > for this kind of a workload
>> > >
>> > > Also, could you disable eager-lock and check if that helps? I
see
>> that max
>> > > time is being spent in acquiring locks.
>> > >
>> > > -Krutika
>> > >
>> > > On Wed, Sep 6, 2017 at 1:38 PM, Abi Askushi <
rightkicktech at gmail.com
>> >
>> > > wrote:
>> > >
>> > >
>> > >
>> > > Hi Krutika,
>> > >
>> > > Is it anything in the profile indicating what is causing this
>> bottleneck?
>> > In
>> > > case i can collect any other info let me know.
>> > >
>> > > Thanx
>> > >
>> > > On Sep 5, 2017 13:27, "Abi Askushi" <
rightkicktech at gmail.com >
>> wrote:
>> > >
>> > >
>> > >
>> > > Hi Krutika,
>> > >
>> > > Attached the profile stats. I enabled profiling then ran some
dd
>> tests.
>> > Also
>> > > 3 Windows VMs are running on top this volume but did not do
any stress
>> > > testing on the VMs. I have left the profiling enabled in case
more
>> time is
>> > > needed for useful stats.
>> > >
>> > > Thanx
>> > >
>> > > On Tue, Sep 5, 2017 at 12:48 PM, Krutika Dhananjay <
>> kdhananj at redhat.com >
>> > > wrote:
>> > >
>> > >
>> > >
>> > > OK my understanding is that with preallocated disks the
performance
>> with
>> > and
>> > > without shard will be the same.
>> > >
>> > > In any case, please attach the volume profile[1], so we can
see what
>> else
>> > is
>> > > slowing things down.
>> > >
>> > > -Krutika
>> > >
>> > > [1] -
>> > > https://gluster.readthedocs.io/en/latest/Administrator%
>> >
20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command
>> > >
>> > > On Tue, Sep 5, 2017 at 2:32 PM, Abi Askushi <
rightkicktech at gmail.com
>> >
>> > > wrote:
>> > >
>> > >
>> > >
>> > > Hi Krutika,
>> > >
>> > > I already have a preallocated disk on VM.
>> > > Now I am checking performance with dd on the hypervisors
which have
>> the
>> > > gluster volume configured.
>> > >
>> > > I tried also several values of shard-block-size and I keep
getting the
>> > same
>> > > low values on write performance.
>> > > Enabling client-io-threads also did not have any affect.
>> > >
>> > > The version of gluster I am using is glusterfs 3.8.12 built
on May 11
>> 2017
>> > > 18:46:20.
>> > > The setup is a set of 3 Centos 7.3 servers and ovirt 4.1,
using
>> gluster as
>> > > storage.
>> > >
>> > > Below are the current settings:
>> > >
>> > >
>> > > Volume Name: vms
>> > > Type: Replicate
>> > > Volume ID: 4513340d-7919-498b-bfe0-d836b5cea40b
>> > > Status: Started
>> > > Snapshot Count: 0
>> > > Number of Bricks: 1 x (2 + 1) = 3
>> > > Transport-type: tcp
>> > > Bricks:
>> > > Brick1: gluster0:/gluster/vms/brick
>> > > Brick2: gluster1:/gluster/vms/brick
>> > > Brick3: gluster2:/gluster/vms/brick (arbiter)
>> > > Options Reconfigured:
>> > > server.event-threads: 4
>> > > client.event-threads: 4
>> > > performance.client-io-threads: on
>> > > features.shard-block-size: 512MB
>> > > cluster.granular-entry-heal: enable
>> > > performance.strict-o-direct: on
>> > > network.ping-timeout: 30
>> > > storage.owner-gid: 36
>> > > storage.owner-uid: 36
>> > > user.cifs: off
>> > > features.shard: on
>> > > cluster.shd-wait-qlength: 10000
>> > > cluster.shd-max-threads: 8
>> > > cluster.locking-scheme: granular
>> > > cluster.data-self-heal-algorithm: full
>> > > cluster.server-quorum-type: server
>> > > cluster.quorum-type: auto
>> > > cluster.eager-lock: enable
>> > > network.remote-dio: off
>> > > performance.low-prio-threads: 32
>> > > performance.stat-prefetch: on
>> > > performance.io-cache: off
>> > > performance.read-ahead: off
>> > > performance.quick-read: off
>> > > transport.address-family: inet
>> > > performance.readdir-ahead: on
>> > > nfs.disable: on
>> > > nfs.export-volumes: on
>> > >
>> > >
>> > > I observed that when testing with dd if=/dev/zero of=testfile
bs=1G
>> > count=1 I
>> > > get 65MB/s on the vms gluster volume (and the network traffic
between
>> the
>> > > servers reaches ~ 500Mbps), while when testing with dd
if=/dev/zero
>> > > of=testfile bs=1G count=1 oflag=direct I get a consistent
10MB/s and
>> the
>> > > network traffic hardly reaching 100Mbps.
>> > >
>> > > Any other things one can do?
>> > >
>> > > On Tue, Sep 5, 2017 at 5:57 AM, Krutika Dhananjay <
>> kdhananj at redhat.com >
>> > > wrote:
>> > >
>> > >
>> > >
>> > > I'm assuming you are using this volume to store vm
images, because I
>> see
>> > > shard in the options list.
>> > >
>> > > Speaking from shard translator's POV, one thing you can
do to improve
>> > > performance is to use preallocated images.
>> > > This will at least eliminate the need for shard to perform
multiple
>> steps
>> > as
>> > > part of the writes - such as creating the shard and then
writing to
>> it and
>> > > then updating the aggregated file size - all of which require
one
>> network
>> > > call each, which further get blown up once they reach AFR
(replicate)
>> into
>> > > many more network calls.
>> > >
>> > > Second, I'm assuming you're using the default shard
block size of 4MB
>> (you
>> > > can confirm this using `gluster volume get <VOL>
shard-block-size`).
>> In
>> > our
>> > > tests, we've found that larger shard sizes perform
better. So maybe
>> change
>> > > the shard-block-size to 64MB (`gluster volume set <VOL>
>> shard-block-size
>> > > 64MB`).
>> > >
>> > > Third, keep stat-prefetch enabled. We've found that qemu
sends quite a
>> > lot of
>> > > [f]stats which can be served from the (md)cache to improve
>> performance. So
>> > > enable that.
>> > >
>> > > Also, could you also enable client-io-threads and see if that
improves
>> > > performance?
>> > >
>> > > Which version of gluster are you using BTW?
>> > >
>> > > -Krutika
>> > >
>> > >
>> > > On Tue, Sep 5, 2017 at 4:32 AM, Abi Askushi <
rightkicktech at gmail.com
>> >
>> > > wrote:
>> > >
>> > >
>> > >
>> > > Hi all,
>> > >
>> > > I have a gluster volume used to host several VMs (managed
through
>> oVirt).
>> > > The volume is a replica 3 with arbiter and the 3 servers use
1 Gbit
>> > network
>> > > for the storage.
>> > >
>> > > When testing with dd (dd if=/dev/zero of=testfile bs=1G
count=1
>> > oflag=direct)
>> > > out of the volume (e.g. writing at /root/) the performance of
the dd
>> is
>> > > reported to be ~ 700MB/s, which is quite decent. When testing
the dd
>> on
>> > the
>> > > gluster volume I get ~ 43 MB/s which way lower from the
previous. When
>> > > testing with dd the gluster volume, the network traffic was
not
>> exceeding
>> > > 450 Mbps on the network interface. I would expect to reach
near 900
>> Mbps
>> > > considering that there is 1 Gbit of bandwidth available. This
results
>> > having
>> > > VMs with very slow performance (especially on their write
operations).
>> > >
>> > > The full details of the volume are below. Any advise on what
can be
>> > tweaked
>> > > will be highly appreciated.
>> > >
>> > > Volume Name: vms
>> > > Type: Replicate
>> > > Volume ID: 4513340d-7919-498b-bfe0-d836b5cea40b
>> > > Status: Started
>> > > Snapshot Count: 0
>> > > Number of Bricks: 1 x (2 + 1) = 3
>> > > Transport-type: tcp
>> > > Bricks:
>> > > Brick1: gluster0:/gluster/vms/brick
>> > > Brick2: gluster1:/gluster/vms/brick
>> > > Brick3: gluster2:/gluster/vms/brick (arbiter)
>> > > Options Reconfigured:
>> > > cluster.granular-entry-heal: enable
>> > > performance.strict-o-direct: on
>> > > network.ping-timeout: 30
>> > > storage.owner-gid: 36
>> > > storage.owner-uid: 36
>> > > user.cifs: off
>> > > features.shard: on
>> > > cluster.shd-wait-qlength: 10000
>> > > cluster.shd-max-threads: 8
>> > > cluster.locking-scheme: granular
>> > > cluster.data-self-heal-algorithm: full
>> > > cluster.server-quorum-type: server
>> > > cluster.quorum-type: auto
>> > > cluster.eager-lock: enable
>> > > network.remote-dio: off
>> > > performance.low-prio-threads: 32
>> > > performance.stat-prefetch: off
>> > > performance.io-cache: off
>> > > performance.read-ahead: off
>> > > performance.quick-read: off
>> > > transport.address-family: inet
>> > > performance.readdir-ahead: on
>> > > nfs.disable: on
>> > > nfs.export-volumes: on
>> > >
>> > >
>> > > Thanx,
>> > > Alex
>> > >
>> > > _______________________________________________
>> > > Gluster-users mailing list
>> > > Gluster-users at gluster.org
>> > > http://lists.gluster.org/mailman/listinfo/gluster-users
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > _______________________________________________
>> > > Gluster-users mailing list
>> > > Gluster-users at gluster.org
>> > > http://lists.gluster.org/mailman/listinfo/gluster-users
>> >
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170912/64a21966/attachment.html>

Apparently Analagous Threads

Search for more maybe matching threads

Gluster users - Sep 2017 - Slow performance of gluster volume

[Gluster-users] Slow performance of gluster volume

[Gluster-users] Slow performance of gluster volume

[Gluster-users] Slow performance of gluster volume

Apparently Analagous Threads