Michael Peek
2013-Jul-31 17:19 UTC
[Gluster-users] Kosher admin practices: What do you do with failed heals? (and out-of-sync replicated bricks)
Hi gurus, I'm back with more shenanigans. I've been testing a setup with four machines, two drives in each. While running an rsync to back up a bunch of files to the volume I simulated a drive failure by forcing one of the drives to remount read-only. I then took Joe Julian's advice and brought the brick back online by: 1) Killing the glusterfsd that was running on this brick 2) Unmounting, fsck'ing, remounting the drive (with a real drive failure, of course, I would be replacing the drive) 3) Typing "gluster volume start $vol force" It seemed to work wonderfully. Next I decided to wipe the data on the volume with an "rm -fr". What I'm left with are a couple of directories that cannot be removed. I get a "Directory not empty" error. When I look at the bricks, the brick that I took offline has a file in each directory, whereas the replicated brick's directories are empty. Specifically, the files left behind are the transient files that rsync creates when it copies. They have a nonsensical file extension that looks like '.iPDK8i'. Once rsync finishes copying a file it renames the file, removing the nonsensical extension. But since the brick in question was offline when rsync renamed the files, it's version of the files with the nonsense extension still exist. But the use of rsync aside, were this a production volume with active users the same scenario could still have happened even without rsync. (In fact, I've created this type of scenario before without rsync by taking a brick offline while an "rm -fr" was running.) Gluster reports no split-brain files, but does report some (35) failed heals. Next I ran "gluster volume heal $vol force". Since there are only two files on the whole volume I didn't expect this to take long. I've left it alone for an hour. However, there's no way that I know of to check and see if the healing process has completed. The command "gluster volume heal $vol info" still lists the two files in question as failed heals. Everything else (the other 33 files reported by gluster earlier) have been taken care of. So what's the correct way to fix this problem? I could just delete the files from the brick directly, but won't that still leave behind something in the .glusterfs/ metadata directory? Does gluster have a mechanism to mark a brick as degraded and force a re-sync from it's replicant? I didn't see anything in the manual about such a mechanism, but maybe I missed it. What would happen if I simply used rsync to resync the replicant brick's data, including the .glusterfs/ metadata directory, back onto the out-of-sync brick? My guess is such an approach would be disastrous on a running system unless I at least killed the gluster processes managing the two bricks. Michael Peek