BJ Quinn
2017-May-23 20:28 UTC
[Gluster-users] temporarily remove a failed brick on distribute only volume
I've got 3 systems I'd like to set up as bricks for a distribute only volume. I understand that if one of them fails, the volume will still read and write files that hash to the non-failed bricks. What I'm wondering is if I could forcibly remove the failed brick from the volume and get the remaining two systems to basically fix the layout to involve only the two non-failed bricks, and then resume writes for all filenames. The reasoning for this would be that it would be anticipated that the failed brick would come back up eventually. Each system is internally redundant, at least from a data standpoint (zfs RAIDZ2), and the types of things that would fail the brick would not likely be representative of a permanent irrecoverable failure, but might take the system out of commission long enough (say, the motherboard fried) that I might not want the volume being down the whole time it takes to repair the failed system. This is a write-mostly workload, and the writes are always being done to new files. With my workload, there is no risk of duplicated file names while the failed brick is away. If 1/3 of the files disappeared for a little while and came back later, that would be acceptable as long as I could write to the volume in the interim. It's fine if this requires manual intervention. My theory on how to bring the brick back would simply be to re-add the repaired brick, with all its existing data in tact, and then do the trick where you run "find" on the whole underlying filesystem to get Gluster to recognize all the files. At that point, you'd be back in business with the whole volume again. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170523/dd518130/attachment.html>