thr3ads.net - Gluster users - [Gluster-users] File Corruption when adding bricks to live replica volumes [Jan 2016]

If this information is useful, please help other people find it:
Share via:

Lindsay Mathieson

2016-Jan-19 11:24 UTC

[Gluster-users] File Corruption when adding bricks to live replica volumes

gluster 3.7.6

I seem to be able to reliably reproduce this. I have a replica 2 volume 
with 1 test VM image. While the VM is  running with heavy disk 
read/writes  (disk benchmark) I add a 3rd brick for replica 3:

gluster volume add-brick datastore1 replica 3 
vng.proxmox.softlog:/vmdata/datastore1

I pretty much immediately get this:

    gluster volume heal datastore1 info
    Brick vna.proxmox.softlog:/vmdata/datastore1
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.20
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.22
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.55 - Possibly
    undergoing heal

    /images/301/vm-301-disk-1.qcow2 - Possibly undergoing heal

    Number of entries: 4

    Brick vnb.proxmox.softlog:/vmdata/datastore1
    /images/301/vm-301-disk-1.qcow2 - Possibly undergoing heal

    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.55 - Possibly
    undergoing heal

    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.20
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.22
    Number of entries: 4

    Brick vng.proxmox.softlog:/vmdata/datastore1
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.16
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.28
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.1
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.22
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.77
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.9
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.5
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.2
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.26
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.15
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.13
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.3
    /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.18
    Number of entries: 13


The brick on vng is the new empty brick, but it has 13 shards being 
healed back to vna & vnb. That can't be right and if I leave it the VM 
becomes hopelessly corrupted. Also there are 81 shards in the files, 
they should all be queued for healing.

Additionally I get read errors when I run a qemu-img check on the VM 
image. If I remove the vng brick the problems are resolved.


If I do the same process while the VM is not running - i.e no files are 
being access, every proceeds as expect. All shard on vn & vnb are healed 
to vng,

-- 
Lindsay Mathieson

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160119/37126601/attachment.html>

Krutika Dhananjay

2016-Jan-19 12:06 UTC

head link

[Gluster-users] File Corruption when adding bricks to live replica volumes

Hi Lindsay, 

Just to be sure we are not missing any steps here, you did invoke 'gluster
volume heal datastore1 full' after adding the third brick, before the heal
could begin, right?

As far as the reverse heal is concerned, there is one issue with add-brick where
replica count is increased, which is still under review.
Could you instead try the following steps at the time of add-brick and tell me
if it works fine:

1. Run 'gluster volume add-brick datastore1 replica 3
vng.proxmox.softlog:/vmdata/datastore1' as usual.

2. Kill the glusterfsd process corresponding to newly added brick (the brick in
vng in your case). You should be able to get its pid in the output of
'gluster volume status datastore1'.
3. Create a dummy file on the root of the volume from the mount point. This can
be any random name.
4. Delete the dummy file created in step 3. 
5. Bring the killed brick back up. For this, you can run 'gluster volume
start datastore1 force'.
6. Then execute 'gluster volume heal datastore1 full' on the node with
the highest uuid (this we know how to do from the previous thread on the same
topic).

Then monitor heal-info output to track heal progress. 
Let me know if this works. 

-Krutika 

----- Original Message -----
> From: "Lindsay Mathieson" <lindsay.mathieson at gmail.com>
> To: "gluster-users" <Gluster-users at gluster.org>
> Sent: Tuesday, January 19, 2016 4:54:07 PM
> Subject: [Gluster-users] File Corruption when adding bricks to live replica
> volumes
> gluster 3.7.6
> I seem to be able to reliably reproduce this. I have a replica 2 volume
with
> 1 test VM image. While the VM is running with heavy disk read/writes (disk
> benchmark) I add a 3rd brick for replica 3:
> gluster volume add-brick datastore1 replica 3
> vng.proxmox.softlog:/vmdata/datastore1
> I pretty much immediately get this:
> > gluster volume heal datastore1 info
> 
> > Brick vna.proxmox.softlog:/vmdata/datastore1
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.20
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.22
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.55 - Possibly undergoing
heal
> 
> > /images/301/vm-301-disk-1.qcow2 - Possibly undergoing heal
> 
> > Number of entries: 4
> 
> > Brick vnb.proxmox.softlog:/vmdata/datastore1
> 
> > /images/301/vm-301-disk-1.qcow2 - Possibly undergoing heal
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.55 - Possibly undergoing
heal
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.20
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.22
> 
> > Number of entries: 4
> 
> > Brick vng.proxmox.softlog:/vmdata/datastore1
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.16
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.28
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.1
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.22
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.77
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.9
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.5
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.2
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.26
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.15
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.13
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.3
> 
> > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.18
> 
> > Number of entries: 13
> 
> The brick on vng is the new empty brick, but it has 13 shards being healed
> back to vna & vnb. That can't be right and if I leave it the VM
becomes
> hopelessly corrupted. Also there are 81 shards in the files, they should
all
> be queued for healing.
> Additionally I get read errors when I run a qemu-img check on the VM image.
> If I remove the vng brick the problems are resolved.
> If I do the same process while the VM is not running - i.e no files are
being
> access, every proceeds as expect. All shard on vn & vnb are healed to
vng,
> --
> Lindsay Mathieson
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160119/bfa22033/attachment.html>

Gluster users - Jan 2016 - File Corruption when adding bricks to live replica volumes

[Gluster-users] File Corruption when adding bricks to live replica volumes

[Gluster-users] File Corruption when adding bricks to live replica volumes