Joshua Saayman
2010-Oct-25 11:52 UTC
[Gluster-users] GlusterFS 3.1 on Amazon EC2 Challenge
Another GlusterFS 3.1 question on my blog (http://cloudarchitect.posterous.com). Any help/ideas will be appreciated. Thanks Joshua ---- Here's my challenge: I have several 1 tb ebs volumes now that are un-replicated and reaching capacity. I'm trying to suss out the most efficient way to get each one of these into its own replicated 4 tb gluster fs. My hope was that I could snapshot each one, restore it twice from the snapshot, and launch this pair as a pre-replicated gluster FS where the 'heal' process (find . | xargs stat) allows the gluster daemon to rationalize the situation, and then add a second pair of empty bricks - to grow on. As you know, all this can be done in just a few minutes. Well, I have now tried this, and I'm afraid I've got a goopy mess... so much for -that- shortcut. Rather than try to debug the situation, I'm curious if there is a better high speed import strategy? A dd for gluster? Any thoughts? If all else fails, I'm happy to create the naked file system and just do an rsync, but last time I did this (when I was testing out LVM) the rsync took 3 days. In general, I'm thinking about this exercise not just as a migration, but also as a test of emergency restore, and a 3 day emergency restore is an awful long time. And one more time, thanks for this informative series. I'm curious where you go for your info (other than senior engineers at gluster!)... This topic - gluster on EBS - seems remarkably sparsely covered for all its massive applicability to cloud infrastructure... is there a gluster on ebs group somewhere? Its not as though either technology is brand new.... Regards, Gart
After painful experience, we have found that the only way to do what you are trying is to add new, empty volumes to glusterfs and let it self-heal the files onto them with find /mnt/gfs -ls. Neither starting with a snapshot nor rsync'ing (even with rsync 3's -X option) to "speed up" the process helps, and generally just ends up making a mess. Additionally, for it to work reliably, you need to shut off any large production load on the system... which for 1TB volumes on EBS is unfortunately going to mean a lot of downtime for you. Scenarios in which we've had to use this blank-volume approach: * Recovering when one of the EBS volumes in a replica pair fails; we start with a blank volume and self-heal from the remaining volume. * Adding a new volume to a distribute volume so as to distribute over more disk space. In this case you have to start by adding a blank volume and then also go through the "re-balancing" process... and note that glusterfs is not always very balanced in the way it distributes files across distribute subvolumes. * Removing a volume from a distribute volume. In this case we were running distribute on top of the replicate and decided to just use larger replicate volumes instead of distribute (b/c of the uneven way distribute was distributing, we were going to run out of space in the glusterfs volume even though one of the underlying bricks was nearly empty, even after "re-balancing). Barry On Mon, Oct 25, 2010 at 7:52 AM, Joshua Saayman <joshua at saayman.me> wrote:> Another GlusterFS 3.1 question on my blog > (http://cloudarchitect.posterous.com). Any help/ideas will be > appreciated. > > Thanks > Joshua > > ---- > > Here's my challenge: I have several 1 tb ebs volumes now that are > un-replicated and reaching capacity. I'm trying to suss out the most > efficient way to get each one of these into its own replicated 4 tb > gluster fs. > > My hope was that I could snapshot each one, restore it twice from the > snapshot, and launch this pair as a pre-replicated gluster FS where > the 'heal' process (find . | xargs stat) allows the gluster daemon to > rationalize the situation, and then add a second pair of empty bricks > - to grow on. As you know, all this can be done in just a few minutes. > > Well, I have now tried this, and I'm afraid I've got a goopy mess... > so much for -that- shortcut. Rather than try to debug the situation, > I'm curious if there is a better high speed import strategy? A dd for > gluster? Any thoughts? > > If all else fails, I'm happy to create the naked file system and just > do an rsync, but last time I did this (when I was testing out LVM) the > rsync took 3 days. In general, I'm thinking about this exercise not > just as a migration, but also as a test of emergency restore, and a 3 > day emergency restore is an awful long time. > > And one more time, thanks for this informative series. I'm curious > where you go for your info (other than senior engineers at > gluster!)... This topic - gluster on EBS - seems remarkably sparsely > covered for all its massive applicability to cloud infrastructure... > is there a gluster on ebs group somewhere? Its not as though either > technology is brand new.... > > Regards, > Gart > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >-- Barry Jaspan Senior Architect | Acquia <http://acquia.com> barry.jaspan at acquia.com | (c) 617.905.2208 | (w) 978.296.5231 "Get a free, hosted Drupal 7 site: http://www.drupalgardens.com"
Barry, Josh, Thank you both for helping me out with this. Restating:> Here's my challenge: I have several 1 tb ebs volumes now that are > un-replicated and reaching capacity. I'm trying to suss out the most > efficient way to get each one of these into its own replicated 4 tb > gluster fs.To keep things simple, I'm going to focus on a single 1 tb EBS volume that is 85% full and contains about 2 million thumbnails. So, based on Barry's first bullet, here is my current working recipe: 1. Spin up two EC2 large intances, load gluster, and peer them 2. Snapshot the production EBS 3. Restore the snapshot onto a new 1 tb EBS, mount it to one of the file servers 4. Create a new blank 1 tb EBS and mount it to the other file server 5. Create a single gluster volume with 'replica 2' and both the above as brick1 and brick2 6. On a third machine, mount the gluster volume using FUSE 7. From this third machine, trigger the heal process using 'find . | xargs stat' 8. Mount a blank 1 tb EBS to each file server, perform addbrick, adding these 2 additional bricks to the gluster volume, and perform a rebalance 9. All else working, shut down production, do a final rsync, and mount the gluster volume in place of the production EBS My priincipal concerns with this relate to Barry's 3rd bullet: Gluster does not rebalance evenly, and so this solution will eventually bounce off the roof and lock up. Forgive my naivete Barry, when you say 'just use larger replicate volumes instead of distribute', what does that mean? Are you running multiple 1 tb EBS bricks in a single 'replica 2' volume under a single file server? My recipe is largely riffing off Josh's tutorial. You've clearly found a recipe that you're happy to entrust production data to... how would you change this? Thanks! Gart