thr3ads.net - Gluster users - [Gluster-users] File replicas out of sync/How to force heal on a file [Apr 2016]

If this information is useful, please help other people find it:
Share via:

Lindsay Mathieson

2016-Apr-24 01:12 UTC

[Gluster-users] File replicas out of sync/How to force heal on a file

I have two problems here. I have a replica 3 sharded volume, gluster 
3.7.11 for VM Images.


Yesterday I stopped the volume and ran a md5sum on all the shards to 
compare the 3 replicas. All 15 VM images were identical except for one 
(vm-307). It has 2048 shards of which 8 differed.

volume heal info lists *no* files needing healed.

Two things concern me:

1. How did this happen? trust in gluster either keeping replica's sync'd
or knowing when they are not is crucial.

2. How do I force a heal of an individual file? I can find no 
documentation as to this process or even if it is possible.

I do have one possible solution - delete the vm image and restore from 
backup. Not ideal.


Notes:
- I did have a hard disk failure on a brick while testing. ZFS recovered 
it with no errors.

- My testing was reasonably severe - server reboots and killing of the 
gluster processes. All things that will happen in a cluster life time. I 
was pleased with how well gluster handled them.

-- 
Lindsay Mathieson

Lindsay Mathieson

2016-Apr-24 06:13 UTC

head link

[Gluster-users] File replicas out of sync/How to force heal on a file

On 24/04/2016 11:12 AM, Lindsay Mathieson wrote:> esterday I stopped the volume and ran a md5sum on all the shards to 
> compare the 3 replicas. All 15 VM images were identical except for one 
> (vm-307). It has 2048 shards of which 8 differed.
>
> volume heal info lists *no* files needing healed.
>
> Two things concern me:
>
> 1. How did this happen? trust in gluster either keeping replica's 
> sync'd or knowing when they are not is crucial.
>
> 2. How do I force a heal of an individual file? I can find no 
> documentation as to this process or even if it is possible.
>
> I do have one possible solution - delete the vm image and restore from 
> backup. Not ideal.
>
>
> Notes:
> - I did have a hard disk failure on a brick while testing. ZFS 
> recovered it with no errors.
>
> - My testing was reasonably severe - server reboots and killing of the 
> gluster processes. All things that will happen in a cluster life time. 
> I was pleased with how well gluster handled them.

Duplicating from a separate msg how I resolved the immediate issue:

I used diff3 to compare the checksums of the shards and it revealed that 
seven of the shards were the same on two bricks (vna & vng) and one of 
the shards was the same on two other bricks (vna & vnb). Fortunately 
none were different on all 3 bricks :)

Using the checksum as a quorum I deleted all the singleton shards (7 on 
vnb, 1 on vng), touched the file owner and issule a "heal full". All 8
shards were restored with matching checksums for the other two bricks. A 
rechack of the entire set of shards for the vm showed all 3 copies as 
identical and the VM itself is functioning normally.

Its one way to manually heal up shard mismatches which gluster hasn't 
detected, if somewhat tedious. Its a method which lends itself to 
automation though.

-- 
Lindsay Mathieson

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160424/19c200f2/attachment.html>

Gluster users - Apr 2016 - File replicas out of sync/How to force heal on a file

[Gluster-users] File replicas out of sync/How to force heal on a file

[Gluster-users] File replicas out of sync/How to force heal on a file