thr3ads.net - Gluster users - [Gluster-users] volume not working after yum update

If this information is useful, please help other people find it:
Share via:

Kingsley

2015-Aug-10 16:07 UTC

[Gluster-users] volume not working after yum update - gluster 3.6.3

On Mon, 2015-08-10 at 21:34 +0530, Atin Mukherjee wrote:> -Atin
> Sent from one plus one
> On Aug 10, 2015 7:19 PM, "Kingsley" <gluster at
gluster.dogwind.com>
> wrote:
> >
> > Further to this, the volume doesn't seem overly healthy. Any idea
> how I
> > can get it back into a working state?
> >
> > Trying to access one particular directory on the clients just hangs.
> If
> > I query heal info, that directory appears in the output as possibly
> > undergoing heal (actual directory name changed as it's private
> info):
> Can you execute strace and see which call is stuck? That would help us
> to get to the exact component which we would need to look at.
Hi,

I've never used strace before. Could you give me the command line to
type?

Then ... do I need to run something on one of the bricks while strace is
running?

Cheers,
Kingsley.

> >
> > [root at gluster1b-1 ~]# gluster volume heal callrec info
> > Brick gluster1a-1.dns99.co.uk:/data/brick/callrec/
> > <gfid:164f888f-2049-49e6-ad26-c758ee091863>
> > /recordings/834723/14391 - Possibly undergoing heal
> >
> > <gfid:e280b40c-d8b7-43c5-9da7-4737054d7a7f>
> > <gfid:b1fbda4a-732f-4f5d-b5a1-8355d786073e>
> > <gfid:edb74524-b4b7-4190-85e7-4aad002f6e7c>
> > <gfid:9b8b8446-1e27-4113-93c2-6727b1f457eb>
> > <gfid:650efeca-b45c-413b-acc3-f0a5853ccebd>
> > Number of entries: 7
> >
> > Brick gluster1b-1.dns99.co.uk:/data/brick/callrec/
> > Number of entries: 0
> >
> > Brick gluster2a-1.dns99.co.uk:/data/brick/callrec/
> > <gfid:e280b40c-d8b7-43c5-9da7-4737054d7a7f>
> > <gfid:164f888f-2049-49e6-ad26-c758ee091863>
> > <gfid:650efeca-b45c-413b-acc3-f0a5853ccebd>
> > <gfid:b1fbda4a-732f-4f5d-b5a1-8355d786073e>
> > /recordings/834723/14391 - Possibly undergoing heal
> >
> > <gfid:edb74524-b4b7-4190-85e7-4aad002f6e7c>
> > <gfid:9b8b8446-1e27-4113-93c2-6727b1f457eb>
> > Number of entries: 7
> >
> > Brick gluster2b-1.dns99.co.uk:/data/brick/callrec/
> > Number of entries: 0
> >
> >
> > If I query each brick directly for the number of files/directories
> > within that, I get 1731 on gluster1a-1 and gluster2a-1, but 1737 on
> the
> > other two, using this command:
> >
> > # find /data/brick/callrec/recordings/834723/14391 -print | wc -l
> >
> > Cheers,
> > Kingsley.
> >
> > On Mon, 2015-08-10 at 11:05 +0100, Kingsley wrote:
> > > Sorry for the blind panic - restarting the volume seems to have
> fixed
> > > it.
> > >
> > > But then my next question - why is this necessary? Surely it
> undermines
> > > the whole point of a high availability system?
> > >
> > > Cheers,
> > > Kingsley.
> > >
> > > On Mon, 2015-08-10 at 10:53 +0100, Kingsley wrote:
> > > > Hi,
> > > >
> > > > We have a 4 way replicated volume using gluster 3.6.3 on
CentOS
> 7.
> > > >
> > > > Over the weekend I did a yum update on each of the bricks in
> turn, but
> > > > now when clients (using fuse mounts) try to access the
volume,
> it hangs.
> > > > Gluster itself wasn't updated (we've disabled that
repo so that
> we keep
> > > > to 3.6.3 for now).
> > > >
> > > > This was what I did:
> > > >
> > > >       * on first brick, "yum update"
> > > >       * reboot brick
> > > >       * watch "gluster volume status" on another
brick and wait
> for it
> > > >         to say all 4 bricks are online before proceeding to
> update the
> > > >         next brick
> > > >
> > > > I was expecting the clients might pause 30 seconds while
they
> notice a
> > > > brick is offline, but then recover.
> > > >
> > > > I've tried re-mounting clients, but that hasn't
helped.
> > > >
> > > > I can't see much data in any of the log files.
> > > >
> > > > I've tried "gluster volume heal callrec" but
it doesn't seem to
> have
> > > > helped.
> > > >
> > > > What shall I do next?
> > > >
> > > > I've pasted some stuff below in case any of it helps.
> > > >
> > > > Cheers,
> > > > Kingsley.
> > > >
> > > > [root at gluster1b-1 ~]# gluster volume info callrec
> > > >
> > > > Volume Name: callrec
> > > > Type: Replicate
> > > > Volume ID: a39830b7-eddb-4061-b381-39411274131a
> > > > Status: Started
> > > > Number of Bricks: 1 x 4 = 4
> > > > Transport-type: tcp
> > > > Bricks:
> > > > Brick1: gluster1a-1:/data/brick/callrec
> > > > Brick2: gluster1b-1:/data/brick/callrec
> > > > Brick3: gluster2a-1:/data/brick/callrec
> > > > Brick4: gluster2b-1:/data/brick/callrec
> > > > Options Reconfigured:
> > > > performance.flush-behind: off
> > > > [root at gluster1b-1 ~]#
> > > >
> > > >
> > > > [root at gluster1b-1 ~]# gluster volume status callrec
> > > > Status of volume: callrec
> > > > Gluster process                                         Port
> Online  Pid
> > > >
>
------------------------------------------------------------------------------
> > > > Brick gluster1a-1:/data/brick/callrec                  
49153
>  Y       6803
> > > > Brick gluster1b-1:/data/brick/callrec                  
49153
>  Y       2614
> > > > Brick gluster2a-1:/data/brick/callrec                  
49153
>  Y       2645
> > > > Brick gluster2b-1:/data/brick/callrec                  
49153
>  Y       4325
> > > > NFS Server on localhost                                 2049
> Y       2769
> > > > Self-heal Daemon on localhost                           N/A
>  Y       2789
> > > > NFS Server on gluster2a-1                               2049
> Y       2857
> > > > Self-heal Daemon on gluster2a-1                         N/A
>  Y       2814
> > > > NFS Server on 88.151.41.100                             2049
> Y       6833
> > > > Self-heal Daemon on 88.151.41.100                       N/A
>  Y       6824
> > > > NFS Server on gluster2b-1                               2049
> Y       4428
> > > > Self-heal Daemon on gluster2b-1                         N/A
>  Y       4387
> > > >
> > > > Task Status of Volume callrec
> > > >
>
------------------------------------------------------------------------------
> > > > There are no active volume tasks
> > > >
> > > > [root at gluster1b-1 ~]#
> > > >
> > > >
> > > > [root at gluster1b-1 ~]# gluster volume heal callrec info
> > > > Brick gluster1a-1.dns99.co.uk:/data/brick/callrec/
> > > > /to_process - Possibly undergoing heal
> > > >
> > > > Number of entries: 1
> > > >
> > > > Brick gluster1b-1.dns99.co.uk:/data/brick/callrec/
> > > > Number of entries: 0
> > > >
> > > > Brick gluster2a-1.dns99.co.uk:/data/brick/callrec/
> > > > /to_process - Possibly undergoing heal
> > > >
> > > > Number of entries: 1
> > > >
> > > > Brick gluster2b-1.dns99.co.uk:/data/brick/callrec/
> > > > Number of entries: 0
> > > >
> > > > [root at gluster1b-1 ~]#
> > > >
> > > >
> > > > _______________________________________________
> > > > Gluster-users mailing list
> > > > Gluster-users at gluster.org
> > > > http://www.gluster.org/mailman/listinfo/gluster-users
> > > >
> > >
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-users
> > >
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
> 
>

Atin Mukherjee

2015-Aug-10 16:09 UTC

head link

[Gluster-users] volume not working after yum update - gluster 3.6.3

-Atin
Sent from one plus one
On Aug 10, 2015 9:37 PM, "Kingsley" <gluster at
gluster.dogwind.com> wrote:>
> On Mon, 2015-08-10 at 21:34 +0530, Atin Mukherjee wrote:
> > -Atin
> > Sent from one plus one
> > On Aug 10, 2015 7:19 PM, "Kingsley" <gluster at
gluster.dogwind.com>
> > wrote:
> > >
> > > Further to this, the volume doesn't seem overly healthy. Any
idea
> > how I
> > > can get it back into a working state?
> > >
> > > Trying to access one particular directory on the clients just
hangs.
> > If
> > > I query heal info, that directory appears in the output as
possibly
> > > undergoing heal (actual directory name changed as it's
private
> > info):
> > Can you execute strace and see which call is stuck? That would help us
> > to get to the exact component which we would need to look at.
>
> Hi,
>
> I've never used strace before. Could you give me the command line to
> type?
Just type strace followed by the command>
> Then ... do I need to run something on one of the bricks while strace is
> running?
>
> Cheers,
> Kingsley.
>
>
> > >
> > > [root at gluster1b-1 ~]# gluster volume heal callrec info
> > > Brick gluster1a-1.dns99.co.uk:/data/brick/callrec/
> > > <gfid:164f888f-2049-49e6-ad26-c758ee091863>
> > > /recordings/834723/14391 - Possibly undergoing heal
> > >
> > > <gfid:e280b40c-d8b7-43c5-9da7-4737054d7a7f>
> > > <gfid:b1fbda4a-732f-4f5d-b5a1-8355d786073e>
> > > <gfid:edb74524-b4b7-4190-85e7-4aad002f6e7c>
> > > <gfid:9b8b8446-1e27-4113-93c2-6727b1f457eb>
> > > <gfid:650efeca-b45c-413b-acc3-f0a5853ccebd>
> > > Number of entries: 7
> > >
> > > Brick gluster1b-1.dns99.co.uk:/data/brick/callrec/
> > > Number of entries: 0
> > >
> > > Brick gluster2a-1.dns99.co.uk:/data/brick/callrec/
> > > <gfid:e280b40c-d8b7-43c5-9da7-4737054d7a7f>
> > > <gfid:164f888f-2049-49e6-ad26-c758ee091863>
> > > <gfid:650efeca-b45c-413b-acc3-f0a5853ccebd>
> > > <gfid:b1fbda4a-732f-4f5d-b5a1-8355d786073e>
> > > /recordings/834723/14391 - Possibly undergoing heal
> > >
> > > <gfid:edb74524-b4b7-4190-85e7-4aad002f6e7c>
> > > <gfid:9b8b8446-1e27-4113-93c2-6727b1f457eb>
> > > Number of entries: 7
> > >
> > > Brick gluster2b-1.dns99.co.uk:/data/brick/callrec/
> > > Number of entries: 0
> > >
> > >
> > > If I query each brick directly for the number of
files/directories
> > > within that, I get 1731 on gluster1a-1 and gluster2a-1, but 1737
on
> > the
> > > other two, using this command:
> > >
> > > # find /data/brick/callrec/recordings/834723/14391 -print | wc -l
> > >
> > > Cheers,
> > > Kingsley.
> > >
> > > On Mon, 2015-08-10 at 11:05 +0100, Kingsley wrote:
> > > > Sorry for the blind panic - restarting the volume seems to
have
> > fixed
> > > > it.
> > > >
> > > > But then my next question - why is this necessary? Surely it
> > undermines
> > > > the whole point of a high availability system?
> > > >
> > > > Cheers,
> > > > Kingsley.
> > > >
> > > > On Mon, 2015-08-10 at 10:53 +0100, Kingsley wrote:
> > > > > Hi,
> > > > >
> > > > > We have a 4 way replicated volume using gluster 3.6.3
on CentOS
> > 7.
> > > > >
> > > > > Over the weekend I did a yum update on each of the
bricks in
> > turn, but
> > > > > now when clients (using fuse mounts) try to access the
volume,
> > it hangs.
> > > > > Gluster itself wasn't updated (we've disabled
that repo so that
> > we keep
> > > > > to 3.6.3 for now).
> > > > >
> > > > > This was what I did:
> > > > >
> > > > >       * on first brick, "yum update"
> > > > >       * reboot brick
> > > > >       * watch "gluster volume status" on
another brick and wait
> > for it
> > > > >         to say all 4 bricks are online before
proceeding to
> > update the
> > > > >         next brick
> > > > >
> > > > > I was expecting the clients might pause 30 seconds
while they
> > notice a
> > > > > brick is offline, but then recover.
> > > > >
> > > > > I've tried re-mounting clients, but that hasn't
helped.
> > > > >
> > > > > I can't see much data in any of the log files.
> > > > >
> > > > > I've tried "gluster volume heal callrec"
but it doesn't seem to
> > have
> > > > > helped.
> > > > >
> > > > > What shall I do next?
> > > > >
> > > > > I've pasted some stuff below in case any of it
helps.
> > > > >
> > > > > Cheers,
> > > > > Kingsley.
> > > > >
> > > > > [root at gluster1b-1 ~]# gluster volume info callrec
> > > > >
> > > > > Volume Name: callrec
> > > > > Type: Replicate
> > > > > Volume ID: a39830b7-eddb-4061-b381-39411274131a
> > > > > Status: Started
> > > > > Number of Bricks: 1 x 4 = 4
> > > > > Transport-type: tcp
> > > > > Bricks:
> > > > > Brick1: gluster1a-1:/data/brick/callrec
> > > > > Brick2: gluster1b-1:/data/brick/callrec
> > > > > Brick3: gluster2a-1:/data/brick/callrec
> > > > > Brick4: gluster2b-1:/data/brick/callrec
> > > > > Options Reconfigured:
> > > > > performance.flush-behind: off
> > > > > [root at gluster1b-1 ~]#
> > > > >
> > > > >
> > > > > [root at gluster1b-1 ~]# gluster volume status callrec
> > > > > Status of volume: callrec
> > > > > Gluster process                                        
Port
> > Online  Pid
> > > > >
> >
------------------------------------------------------------------------------> > > > > Brick gluster1a-1:/data/brick/callrec                  
49153
> >  Y       6803
> > > > > Brick gluster1b-1:/data/brick/callrec                  
49153
> >  Y       2614
> > > > > Brick gluster2a-1:/data/brick/callrec                  
49153
> >  Y       2645
> > > > > Brick gluster2b-1:/data/brick/callrec                  
49153
> >  Y       4325
> > > > > NFS Server on localhost                                
2049
> > Y       2769
> > > > > Self-heal Daemon on localhost                          
N/A
> >  Y       2789
> > > > > NFS Server on gluster2a-1                              
2049
> > Y       2857
> > > > > Self-heal Daemon on gluster2a-1                        
N/A
> >  Y       2814
> > > > > NFS Server on 88.151.41.100                            
2049
> > Y       6833
> > > > > Self-heal Daemon on 88.151.41.100                      
N/A
> >  Y       6824
> > > > > NFS Server on gluster2b-1                              
2049
> > Y       4428
> > > > > Self-heal Daemon on gluster2b-1                        
N/A
> >  Y       4387
> > > > >
> > > > > Task Status of Volume callrec
> > > > >
> >
------------------------------------------------------------------------------> > > > > There are no active volume tasks
> > > > >
> > > > > [root at gluster1b-1 ~]#
> > > > >
> > > > >
> > > > > [root at gluster1b-1 ~]# gluster volume heal callrec
info
> > > > > Brick gluster1a-1.dns99.co.uk:/data/brick/callrec/
> > > > > /to_process - Possibly undergoing heal
> > > > >
> > > > > Number of entries: 1
> > > > >
> > > > > Brick gluster1b-1.dns99.co.uk:/data/brick/callrec/
> > > > > Number of entries: 0
> > > > >
> > > > > Brick gluster2a-1.dns99.co.uk:/data/brick/callrec/
> > > > > /to_process - Possibly undergoing heal
> > > > >
> > > > > Number of entries: 1
> > > > >
> > > > > Brick gluster2b-1.dns99.co.uk:/data/brick/callrec/
> > > > > Number of entries: 0
> > > > >
> > > > > [root at gluster1b-1 ~]#
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Gluster-users mailing list
> > > > > Gluster-users at gluster.org
> > > > > http://www.gluster.org/mailman/listinfo/gluster-users
> > > > >
> > > >
> > > > _______________________________________________
> > > > Gluster-users mailing list
> > > > Gluster-users at gluster.org
> > > > http://www.gluster.org/mailman/listinfo/gluster-users
> > > >
> > >
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-users
> >
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150810/fd52ff66/attachment.html>

Gluster users - Aug 2015 - volume not working after yum update - gluster 3.6.3

[Gluster-users] volume not working after yum update - gluster 3.6.3

[Gluster-users] volume not working after yum update - gluster 3.6.3