thr3ads.net - Gluster users - [Gluster-users] [ovirt-users] Very poor GlusterFS performance [Jun 2017]

If this information is useful, please help other people find it:
Share via:

Krutika Dhananjay

2017-Jun-20 10:23 UTC

[Gluster-users] [ovirt-users] Very poor GlusterFS performance

Couple of things:

1. Like Darrell suggested, you should enable stat-prefetch and increase
client and server event threads to 4.
# gluster volume set <VOL> performance.stat-prefetch on
# gluster volume set <VOL> client.event-threads 4
# gluster volume set <VOL> server.event-threads 4

2. Also glusterfs-3.10.1 and above has a shard performance bug fix -
https://review.gluster.org/#/c/16966/

With these two changes, we saw great improvement in performance in our
internal testing.

Do you mind trying these two options above?

-Krutika

On Tue, Jun 20, 2017 at 1:00 PM, Lindsay Mathieson <
lindsay.mathieson at gmail.com> wrote:
> Have you tried with:
>
> performance.strict-o-direct : off
> performance.strict-write-ordering : off
>
> They can be changed dynamically.
>
>
> On 20 June 2017 at 17:21, Sahina Bose <sabose at redhat.com> wrote:
>
>> [Adding gluster-users]
>>
>> On Mon, Jun 19, 2017 at 8:16 PM, Chris Boot <bootc at bootc.net>
wrote:
>>
>>> Hi folks,
>>>
>>> I have 3x servers in a "hyper-converged" oVirt 4.1.2 +
GlusterFS 3.10
>>> configuration. My VMs run off a replica 3 arbiter 1 volume
comprised of
>>> 6 bricks, which themselves live on two SSDs in each of the servers
(one
>>> brick per SSD). The bricks are XFS on LVM thin volumes straight
onto the
>>> SSDs. Connectivity is 10G Ethernet.
>>>
>>> Performance within the VMs is pretty terrible. I experience very
low
>>> throughput and random IO is really bad: it feels like a latency
issue.
>>> On my oVirt nodes the SSDs are not generally very busy. The 10G
network
>>> seems to run without errors (iperf3 gives bandwidth measurements of
>>>> 9.20 Gbits/sec between the three servers).
>>>
>>> To put this into perspective: I was getting better behaviour from
NFS4
>>> on a gigabit connection than I am with GlusterFS on 10G: that
doesn't
>>> feel right at all.
>>>
>>> My volume configuration looks like this:
>>>
>>> Volume Name: vmssd
>>> Type: Distributed-Replicate
>>> Volume ID: d5a5ddd1-a140-4e0d-b514-701cfe464853
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 2 x (2 + 1) = 6
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: ovirt3:/gluster/ssd0_vmssd/brick
>>> Brick2: ovirt1:/gluster/ssd0_vmssd/brick
>>> Brick3: ovirt2:/gluster/ssd0_vmssd/brick (arbiter)
>>> Brick4: ovirt3:/gluster/ssd1_vmssd/brick
>>> Brick5: ovirt1:/gluster/ssd1_vmssd/brick
>>> Brick6: ovirt2:/gluster/ssd1_vmssd/brick (arbiter)
>>> Options Reconfigured:
>>> nfs.disable: on
>>> transport.address-family: inet6
>>> performance.quick-read: off
>>> performance.read-ahead: off
>>> performance.io-cache: off
>>> performance.stat-prefetch: off
>>> performance.low-prio-threads: 32
>>> network.remote-dio: off
>>> cluster.eager-lock: enable
>>> cluster.quorum-type: auto
>>> cluster.server-quorum-type: server
>>> cluster.data-self-heal-algorithm: full
>>> cluster.locking-scheme: granular
>>> cluster.shd-max-threads: 8
>>> cluster.shd-wait-qlength: 10000
>>> features.shard: on
>>> user.cifs: off
>>> storage.owner-uid: 36
>>> storage.owner-gid: 36
>>> features.shard-block-size: 128MB
>>> performance.strict-o-direct: on
>>> network.ping-timeout: 30
>>> cluster.granular-entry-heal: enable
>>>
>>> I would really appreciate some guidance on this to try to improve
things
>>> because at this rate I will need to reconsider using GlusterFS
>>> altogether.
>>>
>>
>>
>> Could you provide the gluster volume profile output while you're
running
>> your I/O tests.
>>
>> # gluster volume profile <volname> start
>> to start profiling
>>
>> # gluster volume profile <volname> info
>>
>> for the profile output.
>>
>>
>>>
>>> Cheers,
>>> Chris
>>>
>>> --
>>> Chris Boot
>>> bootc at bootc.net
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
> --
> Lindsay
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170620/8421cce1/attachment.html>

mabi

2017-Jun-20 18:00 UTC

head link

[Gluster-users] [ovirt-users] Very poor GlusterFS performance

Dear Krutika,

Sorry for asking so naively but can you tell me on what factor do you base that
the client and server event-threads parameters for a volume should be set to 4?

Is this metric for example based on the number of cores a GlusterFS server has?

I am asking because I saw my GlusterFS volumes are set to 2 and would like to
set these parameters to something meaningful for performance tuning. My setup is
a two node replica with GlusterFS 3.8.11.

Best regards,
M.

-------- Original Message --------
Subject: Re: [Gluster-users] [ovirt-users] Very poor GlusterFS performance
Local Time: June 20, 2017 12:23 PM
UTC Time: June 20, 2017 10:23 AM
From: kdhananj at redhat.com
To: Lindsay Mathieson <lindsay.mathieson at gmail.com>
gluster-users <gluster-users at gluster.org>, oVirt users <users at
ovirt.org>

Couple of things:
1. Like Darrell suggested, you should enable stat-prefetch and increase client
and server event threads to 4.
# gluster volume set <VOL> performance.stat-prefetch on
# gluster volume set <VOL> client.event-threads 4
# gluster volume set <VOL> server.event-threads 4

2. Also glusterfs-3.10.1 and above has a shard performance bug fix -
https://review.gluster.org/#/c/16966/

With these two changes, we saw great improvement in performance in our internal
testing.

Do you mind trying these two options above?
-Krutika

On Tue, Jun 20, 2017 at 1:00 PM, Lindsay Mathieson <lindsay.mathieson at
gmail.com> wrote:

Have you tried with:

performance.strict-o-direct : off
performance.strict-write-ordering : off
They can be changed dynamically.

On 20 June 2017 at 17:21, Sahina Bose <sabose at redhat.com> wrote:

[Adding gluster-users]

On Mon, Jun 19, 2017 at 8:16 PM, Chris Boot <bootc at bootc.net> wrote:
Hi folks,

I have 3x servers in a "hyper-converged" oVirt 4.1.2 + GlusterFS 3.10
configuration. My VMs run off a replica 3 arbiter 1 volume comprised of
6 bricks, which themselves live on two SSDs in each of the servers (one
brick per SSD). The bricks are XFS on LVM thin volumes straight onto the
SSDs. Connectivity is 10G Ethernet.

Performance within the VMs is pretty terrible. I experience very low
throughput and random IO is really bad: it feels like a latency issue.
On my oVirt nodes the SSDs are not generally very busy. The 10G network
seems to run without errors (iperf3 gives bandwidth measurements of >9.20
Gbits/sec between the three servers).

To put this into perspective: I was getting better behaviour from NFS4
on a gigabit connection than I am with GlusterFS on 10G: that doesn't
feel right at all.

My volume configuration looks like this:

Volume Name: vmssd
Type: Distributed-Replicate
Volume ID: d5a5ddd1-a140-4e0d-b514-701cfe464853
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: ovirt3:/gluster/ssd0_vmssd/brick
Brick2: ovirt1:/gluster/ssd0_vmssd/brick
Brick3: ovirt2:/gluster/ssd0_vmssd/brick (arbiter)
Brick4: ovirt3:/gluster/ssd1_vmssd/brick
Brick5: ovirt1:/gluster/ssd1_vmssd/brick
Brick6: ovirt2:/gluster/ssd1_vmssd/brick (arbiter)
Options Reconfigured:
nfs.disable: on
transport.address-family: inet6
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
performance.low-prio-threads: 32
network.remote-dio: off
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
storage.owner-uid: 36
storage.owner-gid: 36
features.shard-block-size: 128MB
performance.strict-o-direct: on
network.ping-timeout: 30
cluster.granular-entry-heal: enable

I would really appreciate some guidance on this to try to improve things
because at this rate I will need to reconsider using GlusterFS altogether.

Could you provide the gluster volume profile output while you're running
your I/O tests.
# gluster volume profile <volname> start

to start profiling

# gluster volume profile <volname> info
for the profile output.

Cheers,
Chris

--
Chris Boot
bootc at bootc.net
_______________________________________________
Users mailing list
Users at ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

--
Lindsay

_______________________________________________
Users mailing list
Users at ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170620/001a794c/attachment.html>

Chris Boot

2017-Jun-21 10:18 UTC

head link

[Gluster-users] [ovirt-users] Very poor GlusterFS performance

[replying to lists this time]

On 20/06/17 11:23, Krutika Dhananjay wrote:> Couple of things:
>
> 1. Like Darrell suggested, you should enable stat-prefetch and increase
> client and server event threads to 4.
> # gluster volume set <VOL> performance.stat-prefetch on
> # gluster volume set <VOL> client.event-threads 4
> # gluster volume set <VOL> server.event-threads 4
>
> 2. Also glusterfs-3.10.1 and above has a shard performance bug fix -
> https://review.gluster.org/#/c/16966/
>
> With these two changes, we saw great improvement in performance in our
> internal testing.
Hi Krutika,

Thanks for your input. I have yet to run any benchmarks, but I'll do
that once I have a bit more time to work on this.

I've tweaked the options as you suggest, but that doesn't seem to have
made an appreciable difference. I admit that without benchmarks it's a
bit like sticking your finger in the air, though. Do I need to restart
my bricks and/or remount the volumes for these to take effect?

I'm actually running GlusterFS 3.10.2-1. This is all coming from the
CentOS Storage SIG's centos-release-gluster310 repository.

Thanks again.

Chris

-- 
Chris Boot
bootc at bootc.net

Krutika Dhananjay

2017-Jun-21 10:32 UTC

head link

[Gluster-users] [ovirt-users] Very poor GlusterFS performance

No. It's just that in the internal testing that was done here, increasing
the thread count beyond 4 did not improve the performance any further.

-Krutika

On Tue, Jun 20, 2017 at 11:30 PM, mabi <mabi at protonmail.ch> wrote:
> Dear Krutika,
>
> Sorry for asking so naively but can you tell me on what factor do you base
> that the client and server event-threads parameters for a volume should be
> set to 4?
>
> Is this metric for example based on the number of cores a GlusterFS server
> has?
>
> I am asking because I saw my GlusterFS volumes are set to 2 and would like
> to set these parameters to something meaningful for performance tuning. My
> setup is a two node replica with GlusterFS 3.8.11.
>
> Best regards,
> M.
>
>
>
> -------- Original Message --------
> Subject: Re: [Gluster-users] [ovirt-users] Very poor GlusterFS performance
> Local Time: June 20, 2017 12:23 PM
> UTC Time: June 20, 2017 10:23 AM
> From: kdhananj at redhat.com
> To: Lindsay Mathieson <lindsay.mathieson at gmail.com>
> gluster-users <gluster-users at gluster.org>, oVirt users <users
at ovirt.org>
>
> Couple of things:
> 1. Like Darrell suggested, you should enable stat-prefetch and increase
> client and server event threads to 4.
> # gluster volume set <VOL> performance.stat-prefetch on
> # gluster volume set <VOL> client.event-threads 4
> # gluster volume set <VOL> server.event-threads 4
>
> 2. Also glusterfs-3.10.1 and above has a shard performance bug fix -
> https://review.gluster.org/#/c/16966/
>
> With these two changes, we saw great improvement in performance in our
> internal testing.
>
> Do you mind trying these two options above?
> -Krutika
>
> On Tue, Jun 20, 2017 at 1:00 PM, Lindsay Mathieson <
> lindsay.mathieson at gmail.com> wrote:
>
>> Have you tried with:
>>
>> performance.strict-o-direct : off
>> performance.strict-write-ordering : off
>> They can be changed dynamically.
>>
>>
>> On 20 June 2017 at 17:21, Sahina Bose <sabose at redhat.com>
wrote:
>>
>>> [Adding gluster-users]
>>>
>>> On Mon, Jun 19, 2017 at 8:16 PM, Chris Boot <bootc at
bootc.net> wrote:
>>>
>>>> Hi folks,
>>>>
>>>> I have 3x servers in a "hyper-converged" oVirt 4.1.2
+ GlusterFS 3.10
>>>> configuration. My VMs run off a replica 3 arbiter 1 volume
comprised of
>>>> 6 bricks, which themselves live on two SSDs in each of the
servers (one
>>>> brick per SSD). The bricks are XFS on LVM thin volumes straight
onto the
>>>> SSDs. Connectivity is 10G Ethernet.
>>>>
>>>> Performance within the VMs is pretty terrible. I experience
very low
>>>> throughput and random IO is really bad: it feels like a latency
issue.
>>>> On my oVirt nodes the SSDs are not generally very busy. The 10G
network
>>>> seems to run without errors (iperf3 gives bandwidth
measurements of >>>>> 9.20 Gbits/sec between the three servers).
>>>>
>>>> To put this into perspective: I was getting better behaviour
from NFS4
>>>> on a gigabit connection than I am with GlusterFS on 10G: that
doesn't
>>>> feel right at all.
>>>>
>>>> My volume configuration looks like this:
>>>>
>>>> Volume Name: vmssd
>>>> Type: Distributed-Replicate
>>>> Volume ID: d5a5ddd1-a140-4e0d-b514-701cfe464853
>>>> Status: Started
>>>> Snapshot Count: 0
>>>> Number of Bricks: 2 x (2 + 1) = 6
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: ovirt3:/gluster/ssd0_vmssd/brick
>>>> Brick2: ovirt1:/gluster/ssd0_vmssd/brick
>>>> Brick3: ovirt2:/gluster/ssd0_vmssd/brick (arbiter)
>>>> Brick4: ovirt3:/gluster/ssd1_vmssd/brick
>>>> Brick5: ovirt1:/gluster/ssd1_vmssd/brick
>>>> Brick6: ovirt2:/gluster/ssd1_vmssd/brick (arbiter)
>>>> Options Reconfigured:
>>>> nfs.disable: on
>>>> transport.address-family: inet6
>>>> performance.quick-read: off
>>>> performance.read-ahead: off
>>>> performance.io-cache: off
>>>> performance.stat-prefetch: off
>>>> performance.low-prio-threads: 32
>>>> network.remote-dio: off
>>>> cluster.eager-lock: enable
>>>> cluster.quorum-type: auto
>>>> cluster.server-quorum-type: server
>>>> cluster.data-self-heal-algorithm: full
>>>> cluster.locking-scheme: granular
>>>> cluster.shd-max-threads: 8
>>>> cluster.shd-wait-qlength: 10000
>>>> features.shard: on
>>>> user.cifs: off
>>>> storage.owner-uid: 36
>>>> storage.owner-gid: 36
>>>> features.shard-block-size: 128MB
>>>> performance.strict-o-direct: on
>>>> network.ping-timeout: 30
>>>> cluster.granular-entry-heal: enable
>>>>
>>>> I would really appreciate some guidance on this to try to
improve things
>>>> because at this rate I will need to reconsider using GlusterFS
>>>> altogether.
>>>>
>>>
>>> Could you provide the gluster volume profile output while
you're running
>>> your I/O tests.
>>> # gluster volume profile <volname> start
>>> to start profiling
>>> # gluster volume profile <volname> info
>>> for the profile output.
>>>
>>>
>>>>
>>>> Cheers,
>>>> Chris
>>>>
>>>> --
>>>> Chris Boot
>>>> bootc at bootc.net
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>>
>> --
>> Lindsay
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170621/5aea96c5/attachment.html>

Krutika Dhananjay

2017-Jun-21 10:34 UTC

head link

[Gluster-users] [ovirt-users] Very poor GlusterFS performance

No, you don't need to do any of that. Just executing volume-set commands is
sufficient for the changes to take effect.


-Krutika

On Wed, Jun 21, 2017 at 3:48 PM, Chris Boot <bootc at bootc.net> wrote:
> [replying to lists this time]
>
> On 20/06/17 11:23, Krutika Dhananjay wrote:
> > Couple of things:
> >
> > 1. Like Darrell suggested, you should enable stat-prefetch and
increase
> > client and server event threads to 4.
> > # gluster volume set <VOL> performance.stat-prefetch on
> > # gluster volume set <VOL> client.event-threads 4
> > # gluster volume set <VOL> server.event-threads 4
> >
> > 2. Also glusterfs-3.10.1 and above has a shard performance bug fix -
> > https://review.gluster.org/#/c/16966/
> >
> > With these two changes, we saw great improvement in performance in our
> > internal testing.
>
> Hi Krutika,
>
> Thanks for your input. I have yet to run any benchmarks, but I'll do
> that once I have a bit more time to work on this.
>
> I've tweaked the options as you suggest, but that doesn't seem to
have
> made an appreciable difference. I admit that without benchmarks it's a
> bit like sticking your finger in the air, though. Do I need to restart
> my bricks and/or remount the volumes for these to take effect?
>
> I'm actually running GlusterFS 3.10.2-1. This is all coming from the
> CentOS Storage SIG's centos-release-gluster310 repository.
>
> Thanks again.
>
> Chris
>
> --
> Chris Boot
> bootc at bootc.net
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170621/2b4a2b7a/attachment.html>

Chris Boot

2017-Jun-26 15:09 UTC

head link

[Gluster-users] [ovirt-users] Very poor GlusterFS performance

On 21/06/17 11:18, Chris Boot wrote:> Thanks for your input. I have yet to run any benchmarks, but I'll do
> that once I have a bit more time to work on this.
Is there a particular benchmark test that I should run to gather some
stats for this? Would certain tests be more useful than others?

Thanks,
Chris

-- 
Chris Boot
bootc at bootc.net

Possibly Parallel Threads

Search for more possibly parallel threads

Gluster users - Jun 2017 - [ovirt-users] Very poor GlusterFS performance

[Gluster-users] [ovirt-users] Very poor GlusterFS performance

[Gluster-users] [ovirt-users] Very poor GlusterFS performance

[Gluster-users] [ovirt-users] Very poor GlusterFS performance

[Gluster-users] [ovirt-users] Very poor GlusterFS performance

[Gluster-users] [ovirt-users] Very poor GlusterFS performance

[Gluster-users] [ovirt-users] Very poor GlusterFS performance

Possibly Parallel Threads