thr3ads.net - Gluster users - [Gluster-users] [ovirt-users] Very poor GlusterFS performance [Jun 2017]

If this information is useful, please help other people find it:
Share via:

Sahina Bose

2017-Jun-20 07:21 UTC

[Gluster-users] [ovirt-users] Very poor GlusterFS performance

[Adding gluster-users]

On Mon, Jun 19, 2017 at 8:16 PM, Chris Boot <bootc at bootc.net> wrote:
> Hi folks,
>
> I have 3x servers in a "hyper-converged" oVirt 4.1.2 + GlusterFS
3.10
> configuration. My VMs run off a replica 3 arbiter 1 volume comprised of
> 6 bricks, which themselves live on two SSDs in each of the servers (one
> brick per SSD). The bricks are XFS on LVM thin volumes straight onto the
> SSDs. Connectivity is 10G Ethernet.
>
> Performance within the VMs is pretty terrible. I experience very low
> throughput and random IO is really bad: it feels like a latency issue.
> On my oVirt nodes the SSDs are not generally very busy. The 10G network
> seems to run without errors (iperf3 gives bandwidth measurements of
>> 9.20 Gbits/sec between the three servers).
>
> To put this into perspective: I was getting better behaviour from NFS4
> on a gigabit connection than I am with GlusterFS on 10G: that doesn't
> feel right at all.
>
> My volume configuration looks like this:
>
> Volume Name: vmssd
> Type: Distributed-Replicate
> Volume ID: d5a5ddd1-a140-4e0d-b514-701cfe464853
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 2 x (2 + 1) = 6
> Transport-type: tcp
> Bricks:
> Brick1: ovirt3:/gluster/ssd0_vmssd/brick
> Brick2: ovirt1:/gluster/ssd0_vmssd/brick
> Brick3: ovirt2:/gluster/ssd0_vmssd/brick (arbiter)
> Brick4: ovirt3:/gluster/ssd1_vmssd/brick
> Brick5: ovirt1:/gluster/ssd1_vmssd/brick
> Brick6: ovirt2:/gluster/ssd1_vmssd/brick (arbiter)
> Options Reconfigured:
> nfs.disable: on
> transport.address-family: inet6
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> performance.low-prio-threads: 32
> network.remote-dio: off
> cluster.eager-lock: enable
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-max-threads: 8
> cluster.shd-wait-qlength: 10000
> features.shard: on
> user.cifs: off
> storage.owner-uid: 36
> storage.owner-gid: 36
> features.shard-block-size: 128MB
> performance.strict-o-direct: on
> network.ping-timeout: 30
> cluster.granular-entry-heal: enable
>
> I would really appreciate some guidance on this to try to improve things
> because at this rate I will need to reconsider using GlusterFS altogether.
>

Could you provide the gluster volume profile output while you're running
your I/O tests.

# gluster volume profile <volname> start
to start profiling

# gluster volume profile <volname> info

for the profile output.

>
> Cheers,
> Chris
>
> --
> Chris Boot
> bootc at bootc.net
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170620/82854694/attachment.html>

Lindsay Mathieson

2017-Jun-20 07:30 UTC

head link

[Gluster-users] [ovirt-users] Very poor GlusterFS performance

Have you tried with:

performance.strict-o-direct : off
performance.strict-write-ordering : off

They can be changed dynamically.


On 20 June 2017 at 17:21, Sahina Bose <sabose at redhat.com> wrote:
> [Adding gluster-users]
>
> On Mon, Jun 19, 2017 at 8:16 PM, Chris Boot <bootc at bootc.net>
wrote:
>
>> Hi folks,
>>
>> I have 3x servers in a "hyper-converged" oVirt 4.1.2 +
GlusterFS 3.10
>> configuration. My VMs run off a replica 3 arbiter 1 volume comprised of
>> 6 bricks, which themselves live on two SSDs in each of the servers (one
>> brick per SSD). The bricks are XFS on LVM thin volumes straight onto
the
>> SSDs. Connectivity is 10G Ethernet.
>>
>> Performance within the VMs is pretty terrible. I experience very low
>> throughput and random IO is really bad: it feels like a latency issue.
>> On my oVirt nodes the SSDs are not generally very busy. The 10G network
>> seems to run without errors (iperf3 gives bandwidth measurements of
>>> 9.20 Gbits/sec between the three servers).
>>
>> To put this into perspective: I was getting better behaviour from NFS4
>> on a gigabit connection than I am with GlusterFS on 10G: that
doesn't
>> feel right at all.
>>
>> My volume configuration looks like this:
>>
>> Volume Name: vmssd
>> Type: Distributed-Replicate
>> Volume ID: d5a5ddd1-a140-4e0d-b514-701cfe464853
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 2 x (2 + 1) = 6
>> Transport-type: tcp
>> Bricks:
>> Brick1: ovirt3:/gluster/ssd0_vmssd/brick
>> Brick2: ovirt1:/gluster/ssd0_vmssd/brick
>> Brick3: ovirt2:/gluster/ssd0_vmssd/brick (arbiter)
>> Brick4: ovirt3:/gluster/ssd1_vmssd/brick
>> Brick5: ovirt1:/gluster/ssd1_vmssd/brick
>> Brick6: ovirt2:/gluster/ssd1_vmssd/brick (arbiter)
>> Options Reconfigured:
>> nfs.disable: on
>> transport.address-family: inet6
>> performance.quick-read: off
>> performance.read-ahead: off
>> performance.io-cache: off
>> performance.stat-prefetch: off
>> performance.low-prio-threads: 32
>> network.remote-dio: off
>> cluster.eager-lock: enable
>> cluster.quorum-type: auto
>> cluster.server-quorum-type: server
>> cluster.data-self-heal-algorithm: full
>> cluster.locking-scheme: granular
>> cluster.shd-max-threads: 8
>> cluster.shd-wait-qlength: 10000
>> features.shard: on
>> user.cifs: off
>> storage.owner-uid: 36
>> storage.owner-gid: 36
>> features.shard-block-size: 128MB
>> performance.strict-o-direct: on
>> network.ping-timeout: 30
>> cluster.granular-entry-heal: enable
>>
>> I would really appreciate some guidance on this to try to improve
things
>> because at this rate I will need to reconsider using GlusterFS
altogether.
>>
>
>
> Could you provide the gluster volume profile output while you're
running
> your I/O tests.
>
> # gluster volume profile <volname> start
> to start profiling
>
> # gluster volume profile <volname> info
>
> for the profile output.
>
>
>>
>> Cheers,
>> Chris
>>
>> --
>> Chris Boot
>> bootc at bootc.net
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>


-- 
Lindsay
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170620/f73d6591/attachment.html>

Krutika Dhananjay

2017-Jun-20 10:23 UTC

head link

[Gluster-users] [ovirt-users] Very poor GlusterFS performance

Couple of things:

1. Like Darrell suggested, you should enable stat-prefetch and increase
client and server event threads to 4.
# gluster volume set <VOL> performance.stat-prefetch on
# gluster volume set <VOL> client.event-threads 4
# gluster volume set <VOL> server.event-threads 4

2. Also glusterfs-3.10.1 and above has a shard performance bug fix -
https://review.gluster.org/#/c/16966/

With these two changes, we saw great improvement in performance in our
internal testing.

Do you mind trying these two options above?

-Krutika

On Tue, Jun 20, 2017 at 1:00 PM, Lindsay Mathieson <
lindsay.mathieson at gmail.com> wrote:
> Have you tried with:
>
> performance.strict-o-direct : off
> performance.strict-write-ordering : off
>
> They can be changed dynamically.
>
>
> On 20 June 2017 at 17:21, Sahina Bose <sabose at redhat.com> wrote:
>
>> [Adding gluster-users]
>>
>> On Mon, Jun 19, 2017 at 8:16 PM, Chris Boot <bootc at bootc.net>
wrote:
>>
>>> Hi folks,
>>>
>>> I have 3x servers in a "hyper-converged" oVirt 4.1.2 +
GlusterFS 3.10
>>> configuration. My VMs run off a replica 3 arbiter 1 volume
comprised of
>>> 6 bricks, which themselves live on two SSDs in each of the servers
(one
>>> brick per SSD). The bricks are XFS on LVM thin volumes straight
onto the
>>> SSDs. Connectivity is 10G Ethernet.
>>>
>>> Performance within the VMs is pretty terrible. I experience very
low
>>> throughput and random IO is really bad: it feels like a latency
issue.
>>> On my oVirt nodes the SSDs are not generally very busy. The 10G
network
>>> seems to run without errors (iperf3 gives bandwidth measurements of
>>>> 9.20 Gbits/sec between the three servers).
>>>
>>> To put this into perspective: I was getting better behaviour from
NFS4
>>> on a gigabit connection than I am with GlusterFS on 10G: that
doesn't
>>> feel right at all.
>>>
>>> My volume configuration looks like this:
>>>
>>> Volume Name: vmssd
>>> Type: Distributed-Replicate
>>> Volume ID: d5a5ddd1-a140-4e0d-b514-701cfe464853
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 2 x (2 + 1) = 6
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: ovirt3:/gluster/ssd0_vmssd/brick
>>> Brick2: ovirt1:/gluster/ssd0_vmssd/brick
>>> Brick3: ovirt2:/gluster/ssd0_vmssd/brick (arbiter)
>>> Brick4: ovirt3:/gluster/ssd1_vmssd/brick
>>> Brick5: ovirt1:/gluster/ssd1_vmssd/brick
>>> Brick6: ovirt2:/gluster/ssd1_vmssd/brick (arbiter)
>>> Options Reconfigured:
>>> nfs.disable: on
>>> transport.address-family: inet6
>>> performance.quick-read: off
>>> performance.read-ahead: off
>>> performance.io-cache: off
>>> performance.stat-prefetch: off
>>> performance.low-prio-threads: 32
>>> network.remote-dio: off
>>> cluster.eager-lock: enable
>>> cluster.quorum-type: auto
>>> cluster.server-quorum-type: server
>>> cluster.data-self-heal-algorithm: full
>>> cluster.locking-scheme: granular
>>> cluster.shd-max-threads: 8
>>> cluster.shd-wait-qlength: 10000
>>> features.shard: on
>>> user.cifs: off
>>> storage.owner-uid: 36
>>> storage.owner-gid: 36
>>> features.shard-block-size: 128MB
>>> performance.strict-o-direct: on
>>> network.ping-timeout: 30
>>> cluster.granular-entry-heal: enable
>>>
>>> I would really appreciate some guidance on this to try to improve
things
>>> because at this rate I will need to reconsider using GlusterFS
>>> altogether.
>>>
>>
>>
>> Could you provide the gluster volume profile output while you're
running
>> your I/O tests.
>>
>> # gluster volume profile <volname> start
>> to start profiling
>>
>> # gluster volume profile <volname> info
>>
>> for the profile output.
>>
>>
>>>
>>> Cheers,
>>> Chris
>>>
>>> --
>>> Chris Boot
>>> bootc at bootc.net
>>> _______________________________________________
>>> Users mailing list
>>> Users at ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
> --
> Lindsay
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170620/8421cce1/attachment.html>

Possibly Parallel Threads

Search for more seemingly similar threads

Gluster users - Jun 2017 - [ovirt-users] Very poor GlusterFS performance

[Gluster-users] [ovirt-users] Very poor GlusterFS performance

[Gluster-users] [ovirt-users] Very poor GlusterFS performance

[Gluster-users] [ovirt-users] Very poor GlusterFS performance

Possibly Parallel Threads