Krutika Dhananjay
2016-Jan-18 12:24 UTC
[Gluster-users] Issues removing then adding a brick to a replica volume (Gluster 3.7.6)
----- Original Message -----> From: "Lindsay Mathieson" <lindsay.mathieson at gmail.com> > To: "gluster-users" <gluster-users at gluster.org> > Sent: Monday, January 18, 2016 11:19:22 AM > Subject: [Gluster-users] Issues removing then adding a brick to a replica > volume (Gluster 3.7.6)> Been running through my eternal testing regime ... and experimenting with > removing/adding bricks - to me, a necessary part of volume maintenance for > dealing with failed disks. The datastore is a VM host and all the following > is done live. Sharding is active with a 512MB shard size.> So I started off with a replica 3 volume> > // recreated from memory > > > Volume Name: datastore1 > > > Type: Replicate > > > Volume ID: bf882533-f1a9-40bf-a13e-d26d934bfa8b > > > Status: Started > > > Number of Bricks: 1 x 3 = 3 > > > Transport-type: tcp > > > Bricks: > > > Brick1: vnb.proxmox.softlog:/vmdata/datastore1 > > > Brick2: vng.proxmox.softlog:/vmdata/datastore1 > > > Brick3: vna.proxmox.softlog:/vmdata/datastore1 >> I remove a brick with:> gluster volume remove-brick datastore1 replica 2 > vng.proxmox.softlog:/vmdata/datastore1 force> so we end up with:> > Volume Name: datastore1 > > > Type: Replicate > > > Volume ID: bf882533-f1a9-40bf-a13e-d26d934bfa8b > > > Status: Started > > > Number of Bricks: 1 x 2 = 2 > > > Transport-type: tcp > > > Bricks: > > > Brick1: vna.proxmox.softlog:/vmdata/datastore1 > > > Brick2: vnb.proxmox.softlog:/vmdata/datastore1 >> All well and good. No heal issues, VM's running ok.> Then I clean the brick off the vng host:> rm -rf /vmdata/datastore1> I then add the brick back with:> > gluster volume add-brick datastore1 replica 3 > > vng.proxmox.softlog:/vmdata/datastore1 >> > Volume Name: datastore1 > > > Type: Replicate > > > Volume ID: bf882533-f1a9-40bf-a13e-d26d934bfa8b > > > Status: Started > > > Number of Bricks: 1 x 3 = 3 > > > Transport-type: tcp > > > Bricks: > > > Brick1: vna.proxmox.softlog:/vmdata/datastore1 > > > Brick2: vnb.proxmox.softlog:/vmdata/datastore1 > > > Brick3: vng.proxmox.softlog:/vmdata/datastore1 >> This recreates the brick directory "datastore1". Unfortunately this is where > things start to go wrong :( Heal info:> > gluster volume heal datastore1 info > > > Brick vna.proxmox.softlog:/vmdata/datastore1 > > > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.57 > > > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.5 > > > Number of entries: 2 >> > Brick vnb.proxmox.softlog:/vmdata/datastore1 > > > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.5 > > > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.57 > > > Number of entries: 2 >> > Brick vng.proxmox.softlog:/vmdata/datastore1 > > > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.1 > > > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.6 > > > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.15 > > > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.18 > > > /.shard/d6aad699-d71d-4b35-b021-d35e5ff297c4.5 >> Its my understanding that there shouldn't be any heal entries on vng as it > that is where all the shards should be sent *to*Lindsay, Heal _is_ necessary when you add a brick that changes the replica count from n to (n+1). Now the new brick that is also part of the existing replica set is lagging with respect to the existing bricks and needs to be brought in sync with these. All files and directories in vna and/or vnb will be healed to vng in your case.> also running qemu-img check on the hosted VM images results in a I/O error. > Eventually the VM's themselves crash - I suspect this is due to individual > shards being unreadable.> Another odd behaviour I get is if I run a full heal on vnb I get the > following error:> > Launching heal operation to perform full self heal on volume datastore1 has > > been unsuccessful >> However if I run it on VNA, it succeeds.Yes, there is a bug report for this @ https://bugzilla.redhat.com/show_bug.cgi?id=1112158 . The workaround, like you yourself figured, is to run the command on the node with the highest uuid. Steps: 1) Collect output of `cat /var/lib/glusterd/glusterd.info | grep UUID` from each of the nodes , perhaps into a file named 'uuid.txt'. 2) cat uuid.txt | sort 3) Pick the last gfid. 4) find out which of the glusterd.info files has the same uuid as this selected uuid. 5) Run 'heal info full' on that same node. Let me know if this works for you. -Krutika> Lastly - if I remove the brick everythign returns to normal immediately. Heal > Info shows no issues and qemu-img check returns no errors.> -- > Lindsay Mathieson> _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160118/aa727b8b/attachment.html>
Lindsay Mathieson
2016-Jan-18 14:08 UTC
[Gluster-users] Issues removing then adding a brick to a replica volume (Gluster 3.7.6)
On 18/01/2016 10:24 PM, Krutika Dhananjay wrote:> Heal _is_ necessary when you add a brick that changes the replica > count from n to (n+1). Now the new brick that is also part of the > existing replica set is lagging with respect to the existing bricks > and needs to be brought in sync with these. All files and directories > in vna and/or vnb will be healed to vng in your case.Yes I realise that :) I was under the impression that heal info lists the files that are the src of the heal, e.g all the files on vna & vnb. vng which is blank would only be receiving files.> Yes, there is a bug report for this @ > https://bugzilla.redhat.com/show_bug.cgi?id=1112158. > The workaround, like you yourself figured, is to run the command on > the node with the highest uuid. > Steps: > 1) Collect output of `cat /var/lib/glusterd/glusterd.info | grep UUID` > from each of the nodes, perhaps into a file named 'uuid.txt'. > 2) cat uuid.txt | sort > 3) Pick the last gfid. > 4) find out which of the glusterd.info files has the same uuid as this > selected uuid. > 5) Run 'heal info full' on that same node. >Will do. And I still have the problem with the files becoming unreadable when adding a brick. Thanks, -- Lindsay Mathieson -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160119/14fd825b/attachment.html>
Lindsay Mathieson
2016-Jan-19 08:22 UTC
[Gluster-users] Issues removing then adding a brick to a replica volume (Gluster 3.7.6)
On 18/01/16 22:24, Krutika Dhananjay wrote:> > However if I run it on VNA, it succeeds. > > > Yes, there is a bug report for this @ > https://bugzilla.redhat.com/show_bug.cgi?id=1112158. > The workaround, like you yourself figured, is to run the command on > the node with the highest uuid. > Steps: > 1) Collect output of `cat /var/lib/glusterd/glusterd.info | grep UUID` > from each of the nodes, perhaps into a file named 'uuid.txt'. > 2) cat uuid.txt | sort > 3) Pick the last gfid. > 4) find out which of the glusterd.info files has the same uuid as this > selected uuid. > 5) Run 'heal info full' on that same node. > > Let me know if this works for you. >Yup, that confirms that the node the heal full worked on (vna) has the highest uuid. When I added the new vng brick and ran heal full from vna then it all worked as expected, all 80 shards were being healed *to* vng. Could that same bug mess with adding bricks as well? when I tested this before from vnb node, shards were being healed from the empty brick (vbg) to vna & vnb -- Lindsay Mathieson -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160119/cce16f25/attachment.html>