Brandon Ooi
2009-Jun-19 04:44 UTC
[Gluster-users] Rebalancing the Distribute Hash Table Translator - Possible solutions
Hi, I'd like to start a discussion on practical ways of rebalancing the DHT Translator. I've thought of a few and wanted to get some feedback. Obviously this is best done by Gluster itself but alas it does not (yet?). If i'm being and idiot and there is an implementation already, please feel free to tell me :P The problem - Growing a DHT filesystem is as easy as adding additional volumes. However, to make sure old files are still accessible, the hash data is stored in the directory. This means any data added to old directories will only go onto the volumes available at the creation time of the directory. It makes it possible to have many empty disks and still run out of disk space. 1.) Automated Recopy - A script of some sort traverses the filesystem and recopies entire directories, deletes the original and moves the new directory into one with the old name. This should rebalance the files as the newly created directories now take into account all the current volumes. Pros: Online rebalance. Effective. Fairly easy to implement as it is done through the Gluster interface so chance of bugs is small. Cons: If the ratio of new volumes to old is small, most of the I/O is effectively wasted copying data between full nodes. Choosing nodes to copy is difficult. 2.) Background Moves - Behind the scenes, run a script to move files from a full disk to the empty disk. Pros: Easy to implement Cons: Cannot be combined with AFR which any realistic cluster would be using. Might break the DHT. 3.) AFR Friendly Background Moves - Behind the scenes, run a script to simultaneously move files from 2+ full disks behind an AFR to the empty 2+ disks behind an AFR Pros: Effective even with AFR. Cons: Hard to implement. Behind the scenes and can potentially cause problems on a running cluster. Might break the DHT. 4.) Background move between AFR volumes - This is assuming you are running a DHT translator in front of a series of AFR volumes. Mount every AFR volume separately, move data from fuller AFRs to emptier ones. Pros: Easy to implement Cons: Offline rebalance (probably). Best case scenario: Forces the DHT to rehash the directory. New hash should take into account all volumes. Medium case scenario: DHT might "heal" the file by moving it to the proper volume. Worse case scenario: Broken DHT. Anyway, just some quick thoughts on how to do it. The common use case, IMHO, is clustering with a few drives and slowly growing based on usage. Brandon