remove-brick in 3.4.0 seems removing wrong bricks, can someone help to review the environment/steps to see if I did anything stupid? setup - Ubuntu 12.04LTS on gfs11 and gfs12, with following packages from ppa, both nodes have 3 xfs partitions sdb1, sdc1, sdd1: ii glusterfs-client 3.4.0final-ubuntu1~precise1 clustered file-system (client package) ii glusterfs-common 3.4.0final-ubuntu1~precise1 GlusterFS common libraries and translator modules ii glusterfs-server 3.4.0final-ubuntu1~precise1 clustered file-system (server package) step to reproduce the problem: 1. create volume gfs_v0 in replica 2 with gfs11:/sdb1 and gfs12:/sdb1 2. add-brick gfs11:/sdc1 and gfs12:/sdc1 3. add-brick gfs11:/sdd1 and gfs12:/sdd1 4. rebalance to make files distributed to all three pair of disks 5. remove-brick gfs11:/sdd1 and gfs12:/sdd1 start, files on ***/sdc1*** are migrating out 6. remove-brick commit led to data loss in gfs_v0 If between step 5 and 6 I initiate a remove-brick targeting /sdc1, then after commit I would not lose anything since all data will be migrated back to /sdb1. -C.B.
Ravishankar N
2013-Aug-13  04:51 UTC
[Gluster-users] remove-brick removed unexpected bricks
On 08/13/2013 03:43 AM, Cool wrote:> remove-brick in 3.4.0 seems removing wrong bricks, can someone help to > review the environment/steps to see if I did anything stupid? > > setup - Ubuntu 12.04LTS on gfs11 and gfs12, with following packages > from ppa, both nodes have 3 xfs partitions sdb1, sdc1, sdd1: > ii glusterfs-client 3.4.0final-ubuntu1~precise1 > clustered file-system (client package) > ii glusterfs-common 3.4.0final-ubuntu1~precise1 > GlusterFS common libraries and translator modules > ii glusterfs-server 3.4.0final-ubuntu1~precise1 > clustered file-system (server package) > > step to reproduce the problem: > 1. create volume gfs_v0 in replica 2 with gfs11:/sdb1 and gfs12:/sdb1 > 2. add-brick gfs11:/sdc1 and gfs12:/sdc1 > 3. add-brick gfs11:/sdd1 and gfs12:/sdd1 > 4. rebalance to make files distributed to all three pair of disks > 5. remove-brick gfs11:/sdd1 and gfs12:/sdd1 start, files on > ***/sdc1*** are migrating out > 6. remove-brick commit led to data loss in gfs_v0 > > If between step 5 and 6 I initiate a remove-brick targeting /sdc1, > then after commit I would not lose anything since all data will be > migrated back to /sdb1. >You should ensure that a 'remove-brick start ' has completed and then commit it before initiating the second one. The correct way to do this would be: 5. # gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 start 6. Check that the data migration has been completed using the status command: # gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 status 7. #gluster volume remove-brick gfs_v0 gfs11:/sdd1 gfs12:/sdd1 commit 8. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 start 9. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 status 10. # gluster volume remove-brick gfs_v0 gfs11:/sdc1 gfs12:/sdc1 commit This would leave you with the original replica 2 volume that you had begun with. Hope this helps. Note: The latest version of glusterfs has the check that prevents a second remove-brick operation until the first one has been committed. (You would receive a message thus : "volume remove-brick start: failed: An earlier remove-brick task exists for volume <volname>. Either commit it or stop it before starting a new task." ) -Ravi> -C.B. > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users