Greene, Tami McFarlin
2019-Apr-15 13:09 UTC
[Gluster-users] Difference between processes: shrinking volume and replacing faulty brick
We need to remove a server node from our configuration (distributed volume). There is more than enough space on the remaining bricks to accept the data attached to the failing server; we didn?t know if one process or the other would be significantly faster. We know shrinking the volume (remove-brick) rebalances as it moves the data; so moving 506G resuled in the rebalancing of 1.8T and took considerable time. Reading the documentation, it seems that replacing a brick is simplying introducing an empty brick to accept the displaced data, but it is the exact same process: remove-brick. Is there anyway to migrate the data without rebalancing at the same time and then rebalancing once all data has been moved? I know that is not ideal, but it would allow us to remove the problem server much quicker and resume production while rebalancing. Tami Tami McFarlin Greene Lab Technician RF, Communications, and Intelligent Systems Group Electrical and Electronics System Research Division Oak Ridge National Laboratory Bldg. 3500, Rm. A15 greenet at ornl.gov<mailto:greent at ornl.gov> (865) 643-0401 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190415/7f190896/attachment.html>
Poornima Gurusiddaiah
2019-Apr-16 06:54 UTC
[Gluster-users] Difference between processes: shrinking volume and replacing faulty brick
Do you have plain distributed volume without any replication? If so replace brick should copy the data on the faulty brick to the new brick, unless there is some old data which also would need rebalance. Having, add brick followed by remove brick and doing a rebalance is inefficient, i think we should have just the old brick data copied to the new brick, and rebalance the whole volume when necessary. Adding the distribute experts to the thread. If you are ok with downtime, trying xfsdump and restore of the faulty brick and reforming the volume may be faster. Regards, Poornima On Mon, Apr 15, 2019, 6:40 PM Greene, Tami McFarlin <greenet at ornl.gov> wrote:> We need to remove a server node from our configuration (distributed > volume). There is more than enough space on the remaining bricks to > accept the data attached to the failing server; we didn?t know if one > process or the other would be significantly faster. We know shrinking the > volume (remove-brick) rebalances as it moves the data; so moving 506G > resuled in the rebalancing of 1.8T and took considerable time. > > > > Reading the documentation, it seems that replacing a brick is simplying > introducing an empty brick to accept the displaced data, but it is the > exact same process: remove-brick. > > > > Is there anyway to migrate the data without rebalancing at the same time > and then rebalancing once all data has been moved? I know that is not > ideal, but it would allow us to remove the problem server much quicker and > resume production while rebalancing. > > > > Tami > > > > Tami McFarlin Greene > > Lab Technician > > RF, Communications, and Intelligent Systems Group > > Electrical and Electronics System Research Division > > Oak Ridge National Laboratory > > Bldg. 3500, Rm. A15 > > greenet at ornl.gov <greent at ornl.gov> (865) > 643-0401 > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190416/fe09d02a/attachment.html>