Pranith Kumar Karampuri
2016-Jan-25 14:54 UTC
[Gluster-users] Unexpected behaviour adding a third server
On 01/23/2016 02:17 PM, Steve Spence wrote:> We've a simple two-server one volume arrangement, replicating ~340k > files (15GB) between our web servers. > > The servers are in AWS, sat in different availability zones. One of > the operations for this weekend is to add another pair of machines, > one in each AZ. > > I've deployed the same OS image of the gluster server (3.6) and was > under the impression I could add a brick to the existing replica > simply by issuing the below: > > gluster volume add-brick volume1 replica 3 pd-wfe3:/gluster-store > > And then presumably would add the fourth server by repeating the above > with "replica 4" and the fourth server name. > > The operation appeared to succeed, the brick appears alongside the others: > > Status: Started > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: pd-wfe1:/gluster-store > Brick2: pd-wfe2:/gluster-store > Brick3: pd-wfe3:/gluster-store > > but almost immediately pd-wfe1 crept up to 100% CPU with the gluster > processes, and nginx began timing out serving content from the volume.Could you disable client-side healing? "gluster volume set <volname> cluster.entry-self-heal off "gluster volume set <volname> cluster.data-self-heal off "gluster volume set <volname> cluster.metadata-self-heal off We are in the process of making this experience smooth for 3.8. by introducing throttling of self-heal traffic, automatic healing. +Anuradha, Could you give him the steps he need to perform after doing add-brick until the patch you sent is merged? Pranith> > The glusterfs-glusterd-vol log is filled with this error at pd-wfe1: > > [2016-01-23 08:43:28.459215] W [socket.c:620:__socket_rwv] > 0-management: readv on > /var/run/c8bc2f99e7584cb9cf077c4f98d1db2e.socket failed (Invalid argument) > > while I see this error for the log named by the mount point: > > [2016-01-23 08:43:28.986379] W > [client-rpc-fops.c:306:client3_3_mkdir_cbk] 2-volume1-client-2: remote > operation failed: Permission denied. Path: (null) > > Does anyone have any suggestions how to proceed? I would appreciate > any input on this one. > > Steve > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160125/a1b305b3/attachment.html>
Anuradha Talur
2016-Jan-27 06:17 UTC
[Gluster-users] Unexpected behaviour adding a third server
----- Original Message -----> From: "Pranith Kumar Karampuri" <pkarampu at redhat.com> > To: "Steve Spence" <steve at pixeldynamo.com>, gluster-users at gluster.org > Cc: "Anuradha Talur" <atalur at redhat.com> > Sent: Monday, January 25, 2016 8:24:41 PM > Subject: Re: [Gluster-users] Unexpected behaviour adding a third server > > > On 01/23/2016 02:17 PM, Steve Spence wrote: > > We've a simple two-server one volume arrangement, replicating ~340k > > files (15GB) between our web servers. > > > > The servers are in AWS, sat in different availability zones. One of > > the operations for this weekend is to add another pair of machines, > > one in each AZ. > > > > I've deployed the same OS image of the gluster server (3.6) and was > > under the impression I could add a brick to the existing replica > > simply by issuing the below: > > > > gluster volume add-brick volume1 replica 3 pd-wfe3:/gluster-store > > > > And then presumably would add the fourth server by repeating the above > > with "replica 4" and the fourth server name. > > > > The operation appeared to succeed, the brick appears alongside the others: > > > > Status: Started > > Number of Bricks: 1 x 3 = 3 > > Transport-type: tcp > > Bricks: > > Brick1: pd-wfe1:/gluster-store > > Brick2: pd-wfe2:/gluster-store > > Brick3: pd-wfe3:/gluster-store > > > > but almost immediately pd-wfe1 crept up to 100% CPU with the gluster > > processes, and nginx began timing out serving content from the volume. > > Could you disable client-side healing? > "gluster volume set <volname> cluster.entry-self-heal off > "gluster volume set <volname> cluster.data-self-heal off > "gluster volume set <volname> cluster.metadata-self-heal off > > We are in the process of making this experience smooth for 3.8. by > introducing throttling of self-heal traffic, automatic healing. > +Anuradha, > Could you give him the steps he need to perform after doing > add-brick until the patch you sent is merged?Hi Steve, Once you add-bricks to a replicate volume such that replica count is increased (2 to 3 and then 3 to 4 in the case you mentioned), files need to be healed from the pre-existing bricks to newly added ones. The process of triggering heals from old to new bricks is not automatic yet. We have a patch for it undergoing review. Meanwhile, you can follow below given steps to trigger heals from self-heal-daemon: I'm assuming you have added the third brick and yet to add the 4th one. 1) Turn off client side self-healing by using steps given by Pranith. 2) Kill the 3rd brick (newly added one). 3) On the new brick, i.e., pd-wfe3:/gluster-store : setfattr -n trusted.afr.dirty -v 0x000000000000000000000001 <path-to-bricks> 4) Say the mount point used for this volume is in /mnt, perform : a) touch /mnt/<non-existent-file> b) setfattr -n <user.non-existent-xattr> -v <1> /mnt c) rm /mnt/<non-existent-file> d) setfattr -x <user.non-existent-xattr> /mnt These operations will set pending xattrs on the newly added brick such that heal is triggered. 5) Bring the brick back up by gluster volume start force 6) Run gluster volume heal <volname> from one of the servers You can monitor if the files are bring healed or not from gluster volume heal <volname> info. Let me know if there is any clarification required.> > Pranith > > > > The glusterfs-glusterd-vol log is filled with this error at pd-wfe1: > > > > [2016-01-23 08:43:28.459215] W [socket.c:620:__socket_rwv] > > 0-management: readv on > > /var/run/c8bc2f99e7584cb9cf077c4f98d1db2e.socket failed (Invalid argument) > > > > while I see this error for the log named by the mount point: > > > > [2016-01-23 08:43:28.986379] W > > [client-rpc-fops.c:306:client3_3_mkdir_cbk] 2-volume1-client-2: remote > > operation failed: Permission denied. Path: (null) > > > > Does anyone have any suggestions how to proceed? I would appreciate > > any input on this one. > > > > Steve > > > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-users > >-- Thanks, Anuradha.