Anirban Ghoshal
2013-Sep-03 17:45 UTC
[Gluster-users] On the ways to detach a brick gracefully from a glusterfs volume while rebooting a node
Hello, We are using GlusterFS 3.4.0 and we have a replicated volume with one brick each on two real-time servers. For certain maintenance purposes, it may be desirable to periodically reboot them. During said reboots, we wish to umount the brick residing on it. However, umount fails (as expected), because of the GlusterFS threads that are using it. We thought of the following ways to counter this: a) Stop the volume, thereby causing its GlusterFS threads to terminate. However, this will mean that the other server would not be able to access the volume, which will be a problem. b) Kill the glusterFS threads on the volume, thereby allowing umount to proceed. However, I am given to understand that this method is not very graceful, and may lead to data loss in case some local modifications have not synced onto the other server. c) Delete the brick from the volume, remove its "trusted.glusterfs.volume-id", and then re-add it once the server comes back up. Could you ?help me with some advice on what would be the best way to do it? Thanks in advance for answering this! -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130904/37b23f95/attachment.html>
Joe Julian
2013-Sep-03 18:30 UTC
[Gluster-users] On the ways to detach a brick gracefully from a glusterfs volume while rebooting a node
On 09/03/2013 10:45 AM, Anirban Ghoshal wrote:> We are using GlusterFS 3.4.0 and we have a replicated volume with one > brick each on two real-time servers. For certain maintenance purposes, > it may be desirable to periodically reboot them. During said reboots, > we wish to umount the brick residing on it. However, umount fails (as > expected), because of the GlusterFS threads that are using it. We > thought of the following ways to counter this: > > a) Stop the volume, thereby causing its GlusterFS threads to > terminate. However, this will mean that the other server would not be > able to access the volume, which will be a problem. > > b) Kill the glusterFS threads on the volume, thereby allowing umount > to proceed. However, I am given to understand that this method is not > very graceful, and may lead to data loss in case some local > modifications have not synced onto the other server. > > c) Delete the brick from the volume, remove its > "trusted.glusterfs.volume-id", and then re-add it once the server > comes back up. > > Could you help me with some advice on what would be the best way to > do it? > >The brick service is glusterfsd so that's what'll need killed. What I like to do is: Kill the brick services for that brick. I, personally, use pkill -f $brick_path since the only application I have running that has the brick path in the command options is glusterfsd. Do no "pkill -9". This will terminate glusterfsd without shutting down the TCP connections leading to your clients hanging for ping-timeout seconds. Perform your maintenance. Start the brick(s) for that volume again with "gluster volume start $vol force". Any files that were changed during the downtime will be self-healed. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130903/25d4fc79/attachment.html>