thr3ads.net - Gluster users - [Gluster-users] Unnecessary healing in 3-node replication setup on reboot [Oct 2015]

If this information is useful, please help other people find it:
Share via:

Vijay Bellur

2015-Oct-16 16:51 UTC

[Gluster-users] Unnecessary healing in 3-node replication setup on reboot

On Friday 16 October 2015 08:11 PM, Lindsay Mathieson
wrote:>
> On 17 October 2015 at 00:26, Udo Giacomozzi <udo.giacomozzi at
indunet.it
> <mailto:udo.giacomozzi at indunet.it>> wrote:
>
>     To me this sounds like Gluster is not really suited for big files,
>     like as the main storage for VMs - since they are being modified
>     constantly.
>
>
> Depends :)
>
> Any replicated storage will have to heal its copies if they are written
> to when a node is down. So long as the files can still be read/written
> while being healed and the resource usage (CPU/Network) is not to high
> then it should be transparent - that's a major  whole pint of a
> replicated filesystem.
>
> I'm guessing that like me, you are running your gluster storage on your
> VM Hosts and you like me are a chronic tweaker, so tend to reboot the
> hosts more than you should. In that case you might want to consider
> moving your gluster storage to seperate dedicated nodes that you can
> leave up.
>
>     Or am I missing something? Perhaps Gluster can be configured to heal
>     only modified parts of the files?
>
>
>
> Not that I know of.
>
self-healing in gluster by default syncs only modified parts of the 
files from a source node. Gluster does a rolling checksum of a file 
needing self-heal to identify regions of the file which need to be 
synced over the network. This rolling checksum computation can sometimes 
be expensive and there are plans to have a lighter self-healing in 3.8 
with more granular changelogs that can do away with the need to do a 
rolling checksum.

You may also want to check sharding (currently in beta with 3.7) where 
large files are chunked to smaller fragments. With this scheme, 
self-healing (and rolling checksum computation thereby) happens only on 
those fragments that undergo changes when one of the nodes in a 
replicated set is offline. This has shown nice improvements in gluster's 
resource utilization during self-healing.

Regards,
Vijay

Lindsay Mathieson

2015-Oct-16 22:17 UTC

head link

[Gluster-users] Unnecessary healing in 3-node replication setup on reboot

On 17 October 2015 at 02:51, Vijay Bellur <vbellur at redhat.com> wrote:
> You may also want to check sharding (currently in beta with 3.7) where
> large files are chunked to smaller fragments. With this scheme,
> self-healing (and rolling checksum computation thereby) happens only on
> those fragments that undergo changes when one of the nodes in a replicated
> set is offline. This has shown nice improvements in gluster's resource
> utilization during self-healing.
>
Very interesting, I presume you'd have top create a new volume to test it.

Also you'd loose the ability to access the file on the host filesystem in
emergencies wouldn't you?


-- 
Lindsay
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20151017/90d90ec3/attachment.html>

Lindsay Mathieson

2015-Oct-16 22:45 UTC

head link

[Gluster-users] Unnecessary healing in 3-node replication setup on reboot

On 17 October 2015 at 02:51, Vijay Bellur <vbellur at redhat.com> wrote:
> You may also want to check sharding (currently in beta with 3.7) where
> large files are chunked to smaller fragments. With this scheme,
> self-healing (and rolling checksum computation thereby) happens only on
> those fragments that undergo changes when one of the nodes in a replicated
> set is offline. This has shown nice improvements in gluster's resource
> utilization during self-healing.
>

Does it effect read speed and random i/o? I guess that would depend on the
methodology used to calculate shard location for a given block. Could be
quite interesting on top of zfs, love to test.


-- 
Lindsay
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20151017/9564482f/attachment.html>

Udo Giacomozzi

2015-Oct-17 15:38 UTC

head link

[Gluster-users] Unnecessary healing in 3-node replication setup on reboot

Am 16.10.2015 um 18:51 schrieb Vijay Bellur:>
> self-healing in gluster by default syncs only modified parts of the 
> files from a source node. Gluster does a rolling checksum of a file 
> needing self-heal to identify regions of the file which need to be 
> synced over the network. This rolling checksum computation can 
> sometimes be expensive and there are plans to have a lighter 
> self-healing in 3.8 with more granular changelogs that can do away 
> with the need to do a rolling checksum.
I did some tests (see below) - could you please check this and tell me 
if this is normal?


For example, I have a 200GB VM disk image in my volume (the biggest 
file). About 75% of that disk is currently unused space and writes are 
only about 50 kbytes/sec.
That 200GB disk image /always/ "heals" a very long time (at least 30 
minutes) - even if I'm pretty sure only a few blocks could have been 
changed.


Anyway, I just rebooted a node (about 2-3 minutes downtime) to collect 
some information:

  * In total I have about 790GB* files in that Gluster volume
  * about 411GB* belong to active VM HDD images, the remaining are
    backup/template files
  * only VM HDD images are being healed (max 15 files)
  * while healing, glusterfsd shows varying CPU usages between 70% and
    650% (it's a 16 cores server); total 106 minutes CPU time once
    healing completed
  * once healing completes, the machine received a total of 7.0 GB and
    sent 3.6 GB over the internal network (so, yes, you're right that
    not all contents are transferred)
  * *total heal time: whopping 58 minutes*

/* these are summed up file sizes; "du" and "df" commands
show smaller usage

/Node details (all 3 nodes are identical):/
/

  * DELL PowerEdge R730
  * Intel Xeon E5-2600 @ 2.4GHz
  * 64 GB DDR4 RAM
  * the server is able to gzip-compress about 1 GB data / second (all
    cores together)
  * 3 TB HW-RAID10 HDD  (2.7TB reserved for Gluster); minimum 500 MB/s
    write speed, 350 MB/s read speed
  * redundant 1GBit/s internal network
  * Debian 7 Wheezy / Proxmox 3.4, Kernel 2.6.32, Gluster 3.5.2

Volume setup:/
/

      # gluster volume info systems

    Volume Name: systems
    Type: Replicate
    Volume ID: b2d72784-4b0e-4f7b-b858-4ec59979a064
    Status: Started
    Number of Bricks: 1 x 3 = 3
    Transport-type: tcp
    Bricks:
    Brick1: metal1:/data/gluster/systems
    Brick2: metal2:/data/gluster/systems
    Brick3: metal3:/data/gluster/systems
    Options Reconfigured:
    cluster.server-quorum-ratio: 51%

/Note that `//gluster volume heal "systems" info//` takes 3-10 seconds
to complete during heal - I hope that doesn't slow down healing since I 
tend to run that command frequently./


Would you expect these results or is something wrong?

Would upgrading to Gluster 3.6 or 3.7 improve healing performance?

Thanks,
Udo

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20151017/5947994f/attachment.html>

Gluster users - Oct 2015 - Unnecessary healing in 3-node replication setup on reboot

[Gluster-users] Unnecessary healing in 3-node replication setup on reboot

[Gluster-users] Unnecessary healing in 3-node replication setup on reboot

[Gluster-users] Unnecessary healing in 3-node replication setup on reboot

[Gluster-users] Unnecessary healing in 3-node replication setup on reboot