Kingsley
2015-Aug-10 16:07 UTC
[Gluster-users] volume not working after yum update - gluster 3.6.3
On Mon, 2015-08-10 at 21:34 +0530, Atin Mukherjee wrote:> -Atin > Sent from one plus one > On Aug 10, 2015 7:19 PM, "Kingsley" <gluster at gluster.dogwind.com> > wrote: > > > > Further to this, the volume doesn't seem overly healthy. Any idea > how I > > can get it back into a working state? > > > > Trying to access one particular directory on the clients just hangs. > If > > I query heal info, that directory appears in the output as possibly > > undergoing heal (actual directory name changed as it's private > info): > Can you execute strace and see which call is stuck? That would help us > to get to the exact component which we would need to look at.Hi, I've never used strace before. Could you give me the command line to type? Then ... do I need to run something on one of the bricks while strace is running? Cheers, Kingsley.> > > > [root at gluster1b-1 ~]# gluster volume heal callrec info > > Brick gluster1a-1.dns99.co.uk:/data/brick/callrec/ > > <gfid:164f888f-2049-49e6-ad26-c758ee091863> > > /recordings/834723/14391 - Possibly undergoing heal > > > > <gfid:e280b40c-d8b7-43c5-9da7-4737054d7a7f> > > <gfid:b1fbda4a-732f-4f5d-b5a1-8355d786073e> > > <gfid:edb74524-b4b7-4190-85e7-4aad002f6e7c> > > <gfid:9b8b8446-1e27-4113-93c2-6727b1f457eb> > > <gfid:650efeca-b45c-413b-acc3-f0a5853ccebd> > > Number of entries: 7 > > > > Brick gluster1b-1.dns99.co.uk:/data/brick/callrec/ > > Number of entries: 0 > > > > Brick gluster2a-1.dns99.co.uk:/data/brick/callrec/ > > <gfid:e280b40c-d8b7-43c5-9da7-4737054d7a7f> > > <gfid:164f888f-2049-49e6-ad26-c758ee091863> > > <gfid:650efeca-b45c-413b-acc3-f0a5853ccebd> > > <gfid:b1fbda4a-732f-4f5d-b5a1-8355d786073e> > > /recordings/834723/14391 - Possibly undergoing heal > > > > <gfid:edb74524-b4b7-4190-85e7-4aad002f6e7c> > > <gfid:9b8b8446-1e27-4113-93c2-6727b1f457eb> > > Number of entries: 7 > > > > Brick gluster2b-1.dns99.co.uk:/data/brick/callrec/ > > Number of entries: 0 > > > > > > If I query each brick directly for the number of files/directories > > within that, I get 1731 on gluster1a-1 and gluster2a-1, but 1737 on > the > > other two, using this command: > > > > # find /data/brick/callrec/recordings/834723/14391 -print | wc -l > > > > Cheers, > > Kingsley. > > > > On Mon, 2015-08-10 at 11:05 +0100, Kingsley wrote: > > > Sorry for the blind panic - restarting the volume seems to have > fixed > > > it. > > > > > > But then my next question - why is this necessary? Surely it > undermines > > > the whole point of a high availability system? > > > > > > Cheers, > > > Kingsley. > > > > > > On Mon, 2015-08-10 at 10:53 +0100, Kingsley wrote: > > > > Hi, > > > > > > > > We have a 4 way replicated volume using gluster 3.6.3 on CentOS > 7. > > > > > > > > Over the weekend I did a yum update on each of the bricks in > turn, but > > > > now when clients (using fuse mounts) try to access the volume, > it hangs. > > > > Gluster itself wasn't updated (we've disabled that repo so that > we keep > > > > to 3.6.3 for now). > > > > > > > > This was what I did: > > > > > > > > * on first brick, "yum update" > > > > * reboot brick > > > > * watch "gluster volume status" on another brick and wait > for it > > > > to say all 4 bricks are online before proceeding to > update the > > > > next brick > > > > > > > > I was expecting the clients might pause 30 seconds while they > notice a > > > > brick is offline, but then recover. > > > > > > > > I've tried re-mounting clients, but that hasn't helped. > > > > > > > > I can't see much data in any of the log files. > > > > > > > > I've tried "gluster volume heal callrec" but it doesn't seem to > have > > > > helped. > > > > > > > > What shall I do next? > > > > > > > > I've pasted some stuff below in case any of it helps. > > > > > > > > Cheers, > > > > Kingsley. > > > > > > > > [root at gluster1b-1 ~]# gluster volume info callrec > > > > > > > > Volume Name: callrec > > > > Type: Replicate > > > > Volume ID: a39830b7-eddb-4061-b381-39411274131a > > > > Status: Started > > > > Number of Bricks: 1 x 4 = 4 > > > > Transport-type: tcp > > > > Bricks: > > > > Brick1: gluster1a-1:/data/brick/callrec > > > > Brick2: gluster1b-1:/data/brick/callrec > > > > Brick3: gluster2a-1:/data/brick/callrec > > > > Brick4: gluster2b-1:/data/brick/callrec > > > > Options Reconfigured: > > > > performance.flush-behind: off > > > > [root at gluster1b-1 ~]# > > > > > > > > > > > > [root at gluster1b-1 ~]# gluster volume status callrec > > > > Status of volume: callrec > > > > Gluster process Port > Online Pid > > > > > ------------------------------------------------------------------------------ > > > > Brick gluster1a-1:/data/brick/callrec 49153 > Y 6803 > > > > Brick gluster1b-1:/data/brick/callrec 49153 > Y 2614 > > > > Brick gluster2a-1:/data/brick/callrec 49153 > Y 2645 > > > > Brick gluster2b-1:/data/brick/callrec 49153 > Y 4325 > > > > NFS Server on localhost 2049 > Y 2769 > > > > Self-heal Daemon on localhost N/A > Y 2789 > > > > NFS Server on gluster2a-1 2049 > Y 2857 > > > > Self-heal Daemon on gluster2a-1 N/A > Y 2814 > > > > NFS Server on 88.151.41.100 2049 > Y 6833 > > > > Self-heal Daemon on 88.151.41.100 N/A > Y 6824 > > > > NFS Server on gluster2b-1 2049 > Y 4428 > > > > Self-heal Daemon on gluster2b-1 N/A > Y 4387 > > > > > > > > Task Status of Volume callrec > > > > > ------------------------------------------------------------------------------ > > > > There are no active volume tasks > > > > > > > > [root at gluster1b-1 ~]# > > > > > > > > > > > > [root at gluster1b-1 ~]# gluster volume heal callrec info > > > > Brick gluster1a-1.dns99.co.uk:/data/brick/callrec/ > > > > /to_process - Possibly undergoing heal > > > > > > > > Number of entries: 1 > > > > > > > > Brick gluster1b-1.dns99.co.uk:/data/brick/callrec/ > > > > Number of entries: 0 > > > > > > > > Brick gluster2a-1.dns99.co.uk:/data/brick/callrec/ > > > > /to_process - Possibly undergoing heal > > > > > > > > Number of entries: 1 > > > > > > > > Brick gluster2b-1.dns99.co.uk:/data/brick/callrec/ > > > > Number of entries: 0 > > > > > > > > [root at gluster1b-1 ~]# > > > > > > > > > > > > _______________________________________________ > > > > Gluster-users mailing list > > > > Gluster-users at gluster.org > > > > http://www.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users at gluster.org > > > http://www.gluster.org/mailman/listinfo/gluster-users > > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-users > >
Atin Mukherjee
2015-Aug-10 16:09 UTC
[Gluster-users] volume not working after yum update - gluster 3.6.3
-Atin Sent from one plus one On Aug 10, 2015 9:37 PM, "Kingsley" <gluster at gluster.dogwind.com> wrote:> > On Mon, 2015-08-10 at 21:34 +0530, Atin Mukherjee wrote: > > -Atin > > Sent from one plus one > > On Aug 10, 2015 7:19 PM, "Kingsley" <gluster at gluster.dogwind.com> > > wrote: > > > > > > Further to this, the volume doesn't seem overly healthy. Any idea > > how I > > > can get it back into a working state? > > > > > > Trying to access one particular directory on the clients just hangs. > > If > > > I query heal info, that directory appears in the output as possibly > > > undergoing heal (actual directory name changed as it's private > > info): > > Can you execute strace and see which call is stuck? That would help us > > to get to the exact component which we would need to look at. > > Hi, > > I've never used strace before. Could you give me the command line to > type?Just type strace followed by the command> > Then ... do I need to run something on one of the bricks while strace is > running? > > Cheers, > Kingsley. > > > > > > > > [root at gluster1b-1 ~]# gluster volume heal callrec info > > > Brick gluster1a-1.dns99.co.uk:/data/brick/callrec/ > > > <gfid:164f888f-2049-49e6-ad26-c758ee091863> > > > /recordings/834723/14391 - Possibly undergoing heal > > > > > > <gfid:e280b40c-d8b7-43c5-9da7-4737054d7a7f> > > > <gfid:b1fbda4a-732f-4f5d-b5a1-8355d786073e> > > > <gfid:edb74524-b4b7-4190-85e7-4aad002f6e7c> > > > <gfid:9b8b8446-1e27-4113-93c2-6727b1f457eb> > > > <gfid:650efeca-b45c-413b-acc3-f0a5853ccebd> > > > Number of entries: 7 > > > > > > Brick gluster1b-1.dns99.co.uk:/data/brick/callrec/ > > > Number of entries: 0 > > > > > > Brick gluster2a-1.dns99.co.uk:/data/brick/callrec/ > > > <gfid:e280b40c-d8b7-43c5-9da7-4737054d7a7f> > > > <gfid:164f888f-2049-49e6-ad26-c758ee091863> > > > <gfid:650efeca-b45c-413b-acc3-f0a5853ccebd> > > > <gfid:b1fbda4a-732f-4f5d-b5a1-8355d786073e> > > > /recordings/834723/14391 - Possibly undergoing heal > > > > > > <gfid:edb74524-b4b7-4190-85e7-4aad002f6e7c> > > > <gfid:9b8b8446-1e27-4113-93c2-6727b1f457eb> > > > Number of entries: 7 > > > > > > Brick gluster2b-1.dns99.co.uk:/data/brick/callrec/ > > > Number of entries: 0 > > > > > > > > > If I query each brick directly for the number of files/directories > > > within that, I get 1731 on gluster1a-1 and gluster2a-1, but 1737 on > > the > > > other two, using this command: > > > > > > # find /data/brick/callrec/recordings/834723/14391 -print | wc -l > > > > > > Cheers, > > > Kingsley. > > > > > > On Mon, 2015-08-10 at 11:05 +0100, Kingsley wrote: > > > > Sorry for the blind panic - restarting the volume seems to have > > fixed > > > > it. > > > > > > > > But then my next question - why is this necessary? Surely it > > undermines > > > > the whole point of a high availability system? > > > > > > > > Cheers, > > > > Kingsley. > > > > > > > > On Mon, 2015-08-10 at 10:53 +0100, Kingsley wrote: > > > > > Hi, > > > > > > > > > > We have a 4 way replicated volume using gluster 3.6.3 on CentOS > > 7. > > > > > > > > > > Over the weekend I did a yum update on each of the bricks in > > turn, but > > > > > now when clients (using fuse mounts) try to access the volume, > > it hangs. > > > > > Gluster itself wasn't updated (we've disabled that repo so that > > we keep > > > > > to 3.6.3 for now). > > > > > > > > > > This was what I did: > > > > > > > > > > * on first brick, "yum update" > > > > > * reboot brick > > > > > * watch "gluster volume status" on another brick and wait > > for it > > > > > to say all 4 bricks are online before proceeding to > > update the > > > > > next brick > > > > > > > > > > I was expecting the clients might pause 30 seconds while they > > notice a > > > > > brick is offline, but then recover. > > > > > > > > > > I've tried re-mounting clients, but that hasn't helped. > > > > > > > > > > I can't see much data in any of the log files. > > > > > > > > > > I've tried "gluster volume heal callrec" but it doesn't seem to > > have > > > > > helped. > > > > > > > > > > What shall I do next? > > > > > > > > > > I've pasted some stuff below in case any of it helps. > > > > > > > > > > Cheers, > > > > > Kingsley. > > > > > > > > > > [root at gluster1b-1 ~]# gluster volume info callrec > > > > > > > > > > Volume Name: callrec > > > > > Type: Replicate > > > > > Volume ID: a39830b7-eddb-4061-b381-39411274131a > > > > > Status: Started > > > > > Number of Bricks: 1 x 4 = 4 > > > > > Transport-type: tcp > > > > > Bricks: > > > > > Brick1: gluster1a-1:/data/brick/callrec > > > > > Brick2: gluster1b-1:/data/brick/callrec > > > > > Brick3: gluster2a-1:/data/brick/callrec > > > > > Brick4: gluster2b-1:/data/brick/callrec > > > > > Options Reconfigured: > > > > > performance.flush-behind: off > > > > > [root at gluster1b-1 ~]# > > > > > > > > > > > > > > > [root at gluster1b-1 ~]# gluster volume status callrec > > > > > Status of volume: callrec > > > > > Gluster process Port > > Online Pid > > > > > > >------------------------------------------------------------------------------> > > > > Brick gluster1a-1:/data/brick/callrec 49153 > > Y 6803 > > > > > Brick gluster1b-1:/data/brick/callrec 49153 > > Y 2614 > > > > > Brick gluster2a-1:/data/brick/callrec 49153 > > Y 2645 > > > > > Brick gluster2b-1:/data/brick/callrec 49153 > > Y 4325 > > > > > NFS Server on localhost 2049 > > Y 2769 > > > > > Self-heal Daemon on localhost N/A > > Y 2789 > > > > > NFS Server on gluster2a-1 2049 > > Y 2857 > > > > > Self-heal Daemon on gluster2a-1 N/A > > Y 2814 > > > > > NFS Server on 88.151.41.100 2049 > > Y 6833 > > > > > Self-heal Daemon on 88.151.41.100 N/A > > Y 6824 > > > > > NFS Server on gluster2b-1 2049 > > Y 4428 > > > > > Self-heal Daemon on gluster2b-1 N/A > > Y 4387 > > > > > > > > > > Task Status of Volume callrec > > > > > > >------------------------------------------------------------------------------> > > > > There are no active volume tasks > > > > > > > > > > [root at gluster1b-1 ~]# > > > > > > > > > > > > > > > [root at gluster1b-1 ~]# gluster volume heal callrec info > > > > > Brick gluster1a-1.dns99.co.uk:/data/brick/callrec/ > > > > > /to_process - Possibly undergoing heal > > > > > > > > > > Number of entries: 1 > > > > > > > > > > Brick gluster1b-1.dns99.co.uk:/data/brick/callrec/ > > > > > Number of entries: 0 > > > > > > > > > > Brick gluster2a-1.dns99.co.uk:/data/brick/callrec/ > > > > > /to_process - Possibly undergoing heal > > > > > > > > > > Number of entries: 1 > > > > > > > > > > Brick gluster2b-1.dns99.co.uk:/data/brick/callrec/ > > > > > Number of entries: 0 > > > > > > > > > > [root at gluster1b-1 ~]# > > > > > > > > > > > > > > > _______________________________________________ > > > > > Gluster-users mailing list > > > > > Gluster-users at gluster.org > > > > > http://www.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > > _______________________________________________ > > > > Gluster-users mailing list > > > > Gluster-users at gluster.org > > > > http://www.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users at gluster.org > > > http://www.gluster.org/mailman/listinfo/gluster-users > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150810/fd52ff66/attachment.html>