Don Spidell
2011-Oct-04 20:14 UTC
[Gluster-users] Gluster on EC2 - how to replace failed EBS volume?
Hi all, Apologies if this has been asked and answered, however I couldn't find the answer anywhere. Here's my situation: I am trying to make a highly available 1TB data volume on EC2. I'm using Gluster 3.1.3 on EC2 and have a replicated volume consisting of two bricks. Each brick is in a separate Availability Zone and consists of eight 125GB EBS volumes in a RAID0 array. (Total usable space presented to Gluster client is 1TB.) My question is what is the best practice for how to replace a failing/failed EBS volume? It seems that I have two choices: 1. Remove the brick from the Gluster volume, stop the array, detach the 8 vols, make new vols from last good snapshot, attach new vols, restart array, re-add brick to volume, perform self-heal. or 2. Remove the brick from the Gluster volume, stop the array, detach the 8 vols, make brand new empty volumes, attach new vols, restart array, re-add brick to volume, perform self-heal. Seems like this one would take forever and kill performance. Or maybe there's a third option that's even better? Thanks so much, Don
Olivier Nicole
2011-Oct-05 02:37 UTC
[Gluster-users] Gluster on EC2 - how to replace failed EBS volume?
Hi Don,> 1. Remove the brick from the Gluster volume, stop the array, detach the 8 vols, make new vols from last good snapshot, attach new vols, restart array, re-add brick to volume, perform self-heal. > > or > > 2. Remove the brick from the Gluster volume, stop the array, detach the 8 vols, make brand new empty volumes, attach new vols, restart array, re-add brick to volume, perform self-heal. Seems like this one would take forever and kill performance.I am very new to Gluster, but I would think that solution 2 is the safest: you don't mix-up the rebuild from two different sources, only Gluster is involved in rebuilding. Though I have read that you can self-heal with a time parameter to limit the find to the files that were modified since your brick was off line. So I beleive that could be extended to the time since your snapshot. Instead of configuring your 8 disks in RAID 0, I would use JOBD and let Gluster do the concatenation. That way, when you replace a disk, you just have 125 GB to self-heal. Best regards, Olivier
Don Spidell
2011-Oct-12 15:26 UTC
[Gluster-users] how to swap out failed brick in distributed replicated setup?
Hi all, I have a distributed replicated Gluster 3.1.3 setup on Amazon EC2. I have eight 125GB EBS volumes attached to an instance in one Availability Zone, and eight 125GB EBS volumes attached to an instance in another AZ. Both instances are peers. The 16 volumes are presented as one 1TB volume to my Gluster client instance. Say an EBS volume fails or gets hung up. I need to know the procedure for swapping it out for a fresh volume. I tried just unmounting it, but Gluster freaked out. I looked at remove brick and migrate brick commands, but I want to be able to swap out a volume while Gluster is online and avoid any data loss. Thanks in advance for your insights. Don