B.K.Raghuram
2013-Oct-30 10:13 UTC
[Gluster-users] Strange behaviour with add-brick followed by remove-brick
I have gluster 3.4.1 on 4 boxes with hostnames n9, n10, n11, n12. I did the following sequence of steps and ended up with losing data so what did I do wrong?! - Create a distributed volume with bricks on n9 and n10 - Started the volume - NFS mounted the volume and created 100 files on it. Found that n9 had 45, n10 had 55 - Added a brick n11 to this volume - Removed a brick n10 from the volume with gluster remove brick <vol> <n10 brick name> start - n9 now has 45 files, n10 has 55 files and n11 has 45 files(all the same as on n9) - Checked status, it shows that no rebalanced files but that n10 had scanned 100 files and completed. 0 scanned for all the others - I then did a rebalance start force on the vol and found that n9 had 0 files, n10 had 55 files and n11 had 45 files - weird - looked like n9 had been removed but double checked again and found that n10 had indeed been removed. - did a remove-brick commit. Now same file distribution after that. volume info now shows the volume to have n9 and n11 and bricks. - did a rebalance start again on the volume. The rebalance-status now shows n11 had 45 rebalanced files, all the brick nodes had 45 files scanned and all show complete. The file layout after this is n9 has 45 files and n10 has 55 files. n11 has 0 files! - An ls on the nfs mount now shows only 45 files so the other 55 not visible because they are on n10 which is not part of the volume! What have I done wrong in this sequence?
Lalatendu Mohanty
2013-Oct-30 15:10 UTC
[Gluster-users] Strange behaviour with add-brick followed by remove-brick
On 10/30/2013 03:43 PM, B.K.Raghuram wrote:> I have gluster 3.4.1 on 4 boxes with hostnames n9, n10, n11, n12. I > did the following sequence of steps and ended up with losing data so > what did I do wrong?! > > - Create a distributed volume with bricks on n9 and n10 > - Started the volume > - NFS mounted the volume and created 100 files on it. Found that n9 > had 45, n10 had 55 > - Added a brick n11 to this volume > - Removed a brick n10 from the volume with gluster remove brick <vol> > <n10 brick name> start > - n9 now has 45 files, n10 has 55 files and n11 has 45 files(all the > same as on n9) > - Checked status, it shows that no rebalanced files but that n10 had > scanned 100 files and completed. 0 scanned for all the others > - I then did a rebalance start force on the vol and found that n9 had > 0 files, n10 had 55 files and n11 had 45 files - weird - looked like > n9 had been removed but double checked again and found that n10 had > indeed been removed. > - did a remove-brick commit. Now same file distribution after that. > volume info now shows the volume to have n9 and n11 and bricks. > - did a rebalance start again on the volume. The rebalance-status now > shows n11 had 45 rebalanced files, all the brick nodes had 45 files > scanned and all show complete. The file layout after this is n9 has 45 > files and n10 has 55 files. n11 has 0 files! > - An ls on the nfs mount now shows only 45 files so the other 55 not > visible because they are on n10 which is not part of the volume! > > What have I done wrong in this sequence? > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users| I think running rebalnce (force) in between "remove brick start" and "remove brick commit" is the issue. Can you please paste your command as per the time line of events. That would make it more clear. Below are the steps, I do to replace a brick and it works for me. | 1. |gluster volume add-brick /|VOLNAME NEW-BRICK|/| 2. |gluster volume remove-brick |VOLNAME|/|BRICK|/| |start| 3. |gluster volume remove-brick |VOLNAME|/|BRICK|/||status| 4. |gluster volume remove-brick |VOLNAME /BRICK/| commit| -Lala -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131030/f7156949/attachment.html>
Lalatendu Mohanty
2013-Oct-30 15:21 UTC
[Gluster-users] Strange behaviour with add-brick followed by remove-brick
On 10/30/2013 08:40 PM, Lalatendu Mohanty wrote:> On 10/30/2013 03:43 PM, B.K.Raghuram wrote: >> I have gluster 3.4.1 on 4 boxes with hostnames n9, n10, n11, n12. I >> did the following sequence of steps and ended up with losing data so >> what did I do wrong?! >> >> - Create a distributed volume with bricks on n9 and n10 >> - Started the volume >> - NFS mounted the volume and created 100 files on it. Found that n9 >> had 45, n10 had 55 >> - Added a brick n11 to this volume >> - Removed a brick n10 from the volume with gluster remove brick <vol> >> <n10 brick name> start >> - n9 now has 45 files, n10 has 55 files and n11 has 45 files(all the >> same as on n9) >> - Checked status, it shows that no rebalanced files but that n10 had >> scanned 100 files and completed. 0 scanned for all the others >> - I then did a rebalance start force on the vol and found that n9 had >> 0 files, n10 had 55 files and n11 had 45 files - weird - looked like >> n9 had been removed but double checked again and found that n10 had >> indeed been removed. >> - did a remove-brick commit. Now same file distribution after that. >> volume info now shows the volume to have n9 and n11 and bricks. >> - did a rebalance start again on the volume. The rebalance-status now >> shows n11 had 45 rebalanced files, all the brick nodes had 45 files >> scanned and all show complete. The file layout after this is n9 has 45 >> files and n10 has 55 files. n11 has 0 files! >> - An ls on the nfs mount now shows only 45 files so the other 55 not >> visible because they are on n10 which is not part of the volume! >> >> What have I done wrong in this sequence? >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://supercolony.gluster.org/mailman/listinfo/gluster-users > | > I think running rebalnce (force) in between "remove brick start" and > "remove brick commit" is the issue. Can you please paste your command > as per the time line of events. That would make it more clear. > > Below are the steps, I do to replace a brick and it works for me. > > | > > 1. |gluster volume add-brick /|VOLNAME NEW-BRICK|/| > 2. |gluster volume remove-brick |VOLNAME|/|BRICK|/| |start| > 3. |gluster volume remove-brick |VOLNAME|/|BRICK|/||status| > 4. |gluster volume remove-brick |VOLNAME /BRICK/| commit| >I will also suggest you to use distribute-replicate volumes, so that you have a replica copy always and it reduces the probability of losing data. -Lala -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131030/2e0af43b/attachment.html>