thr3ads.net - Gluster users - [Gluster-users] High I/O And Processor Utilization [Jan 2016]

If this information is useful, please help other people find it:
Share via:

Pranith Kumar Karampuri

2016-Jan-12 03:32 UTC

[Gluster-users] High I/O And Processor Utilization

On 01/12/2016 08:52 AM, Lindsay Mathieson wrote:> On 11/01/16 15:37, Krutika Dhananjay wrote:
>> Kyle,
>>
>> Based on the testing we have done from our end, we've found that 
>> 512MB is a good number that is neither too big nor too small,
>> and provides good performance both on the IO side and with respect to 
>> self-heal.
>
>
> Hi Krutika, I experimented a lot with different chunk sizes, didn't 
> find all that much difference between 4MB and 1GB
>
> But benchmarks are tricky things - I used Crystal Diskmark inside a 
> VM, which is probably not the best assessment. And two of the bricks 
> on my replica 3 are very slow, just test drives, not production. So I 
> guess that would effevt things :)
>
> These are my current setting - what do you use?
>
> Volume Name: datastore1
> Type: Replicate
> Volume ID: 1261175d-64e1-48b1-9158-c32802cc09f0
> Status: Started
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: vnb.proxmox.softlog:/vmdata/datastore1
> Brick2: vng.proxmox.softlog:/vmdata/datastore1
> Brick3: vna.proxmox.softlog:/vmdata/datastore1
> Options Reconfigured:
> network.remote-dio: enable
> cluster.eager-lock: enable
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> performance.stat-prefetch: off
> performance.strict-write-ordering: on
> performance.write-behind: off
> nfs.enable-ino32: off
> nfs.addr-namelookup: off
> nfs.disable: on
> performance.cache-refresh-timeout: 4
> performance.io-thread-count: 32
> cluster.server-quorum-type: server
> cluster.quorum-type: auto
> client.event-threads: 4
> server.event-threads: 4
> cluster.self-heal-window-size: 256
> features.shard-block-size: 512MB
> features.shard: on
> performance.readdir-ahead: off
>
>Most of these tests are done by Paul Cuzner (CCed).

Pranith

Kyle Harris

2016-Jan-14 23:16 UTC

head link

[Gluster-users] High I/O And Processor Utilization

I thought I might take a minute to update everyone on this situation.  I
rebuilt the glusterFS using a shard size of 256MB and then imported all VMs
back on to the cluster.  I rebuilt it from scratch rather than just doing
an export/import on the data so I could start everything fresh.  I wish now
I would have used 512MB but unfortunately I didn?t see that suggestion in
time.  Anyway, the good news is the system load has greatly decreased.  The
systems are all now in a usable state.

The bad news is that I am still seeing a bunch of heals and not sure why.
Because of that, I am also still seeing the drives slow down from over 110
MB/sec without Gluster running on them to ~ 25 MB/sec with bricks running
on the drives.  So it seems to me there is still an issue here.

Also as fate would have it, I am having issues due to a bug in the Gluster
NFS implementation on this version (3.7) that necessitates the need to set
nfs.acl to off so also hoping for a fix for that soon.  I think this is the
bug but not sure:  https://bugzilla.redhat.com/show_bug.cgi?id=1238318

So to sum up, things are working but the performance leaves much to be
desired due mostly I suspect due to all the heals taking place.

- Kyle

On Mon, Jan 11, 2016 at 9:32 PM, Pranith Kumar Karampuri <
pkarampu at redhat.com> wrote:
>
>
> On 01/12/2016 08:52 AM, Lindsay Mathieson wrote:
>
>> On 11/01/16 15:37, Krutika Dhananjay wrote:
>>
>>> Kyle,
>>>
>>> Based on the testing we have done from our end, we've found
that 512MB
>>> is a good number that is neither too big nor too small,
>>> and provides good performance both on the IO side and with respect
to
>>> self-heal.
>>>
>>
>>
>> Hi Krutika, I experimented a lot with different chunk sizes, didn't
find
>> all that much difference between 4MB and 1GB
>>
>> But benchmarks are tricky things - I used Crystal Diskmark inside a VM,
>> which is probably not the best assessment. And two of the bricks on my
>> replica 3 are very slow, just test drives, not production. So I guess
that
>> would effevt things :)
>>
>> These are my current setting - what do you use?
>>
>> Volume Name: datastore1
>> Type: Replicate
>> Volume ID: 1261175d-64e1-48b1-9158-c32802cc09f0
>> Status: Started
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: vnb.proxmox.softlog:/vmdata/datastore1
>> Brick2: vng.proxmox.softlog:/vmdata/datastore1
>> Brick3: vna.proxmox.softlog:/vmdata/datastore1
>> Options Reconfigured:
>> network.remote-dio: enable
>> cluster.eager-lock: enable
>> performance.io-cache: off
>> performance.read-ahead: off
>> performance.quick-read: off
>> performance.stat-prefetch: off
>> performance.strict-write-ordering: on
>> performance.write-behind: off
>> nfs.enable-ino32: off
>> nfs.addr-namelookup: off
>> nfs.disable: on
>> performance.cache-refresh-timeout: 4
>> performance.io-thread-count: 32
>> cluster.server-quorum-type: server
>> cluster.quorum-type: auto
>> client.event-threads: 4
>> server.event-threads: 4
>> cluster.self-heal-window-size: 256
>> features.shard-block-size: 512MB
>> features.shard: on
>> performance.readdir-ahead: off
>>
>>
>> Most of these tests are done by Paul Cuzner (CCed).
>
> Pranith
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160114/2c5ee326/attachment.html>

Gluster users - Jan 2016 - High I/O And Processor Utilization

[Gluster-users] High I/O And Processor Utilization

[Gluster-users] High I/O And Processor Utilization