Hi,
I had a 4 x 3 gluster volume distributed over 6 servers (2 bricks from each). I
wanted to move to 4 x 2 volume removing two nodes. The initial config is here:
http://fpaste.org/txxs/
I asked for the command how to do it and got it from the gluster IRC. I then
proceeded to run it:
gluster volume remove-brick home0 replica 2 192.168.1.243:/d35
192.168.1.240:/d35 192.168.1.243:/d36 192.168.1.240:/d36
having had read the gluster help output I assertained I should probably add
start to the end to have it gracefully check everything (it did warn me without
of possible data loss). However the result was that it started rebalancing and
immediately had reconfigured the volume to 6 x 2 replica sets so now I have a
HUGE mess:
http://fpaste.org/EpKG/
Most processes failed and directory listings come in double:
[root at wn-c-27 test]# ls
ls: cannot access hadoop-fuse-addon.tgz: No such file or directory
ls: cannot access hadoop-fuse-addon.tgz: No such file or directory
etc hadoop-fuse-addon.tgz hadoop-fuse-addon.tgz
[root at wn-c-27 test]#
I need urgently help how to recover from this state? It seems gluster now has me
in a huge mess and it will be tough to get out of it. Immediately when I noticed
this I stopped the brick-remove with stop command, but the mess is as it is.
Should I force the remove brick? Should I stop the volume and stop gluster and
manually reconfigure it to 4x3 or how can I recover to a consistent filesystem.
This is users /home so a huge mess is NOT a good thing. Due to 3x replication
there is no backup right now either...
Mario Kadastik, PhD
Researcher
---
"Physics is like sex, sure it may have practical reasons, but that's
not why we do it"
-- Richard P. Feynman