Kingsley
2015-Aug-10 10:05 UTC
[Gluster-users] volume not working after yum update - gluster 3.6.3
Sorry for the blind panic - restarting the volume seems to have fixed it. But then my next question - why is this necessary? Surely it undermines the whole point of a high availability system? Cheers, Kingsley. On Mon, 2015-08-10 at 10:53 +0100, Kingsley wrote:> Hi, > > We have a 4 way replicated volume using gluster 3.6.3 on CentOS 7. > > Over the weekend I did a yum update on each of the bricks in turn, but > now when clients (using fuse mounts) try to access the volume, it hangs. > Gluster itself wasn't updated (we've disabled that repo so that we keep > to 3.6.3 for now). > > This was what I did: > > * on first brick, "yum update" > * reboot brick > * watch "gluster volume status" on another brick and wait for it > to say all 4 bricks are online before proceeding to update the > next brick > > I was expecting the clients might pause 30 seconds while they notice a > brick is offline, but then recover. > > I've tried re-mounting clients, but that hasn't helped. > > I can't see much data in any of the log files. > > I've tried "gluster volume heal callrec" but it doesn't seem to have > helped. > > What shall I do next? > > I've pasted some stuff below in case any of it helps. > > Cheers, > Kingsley. > > [root at gluster1b-1 ~]# gluster volume info callrec > > Volume Name: callrec > Type: Replicate > Volume ID: a39830b7-eddb-4061-b381-39411274131a > Status: Started > Number of Bricks: 1 x 4 = 4 > Transport-type: tcp > Bricks: > Brick1: gluster1a-1:/data/brick/callrec > Brick2: gluster1b-1:/data/brick/callrec > Brick3: gluster2a-1:/data/brick/callrec > Brick4: gluster2b-1:/data/brick/callrec > Options Reconfigured: > performance.flush-behind: off > [root at gluster1b-1 ~]# > > > [root at gluster1b-1 ~]# gluster volume status callrec > Status of volume: callrec > Gluster process Port Online Pid > ------------------------------------------------------------------------------ > Brick gluster1a-1:/data/brick/callrec 49153 Y 6803 > Brick gluster1b-1:/data/brick/callrec 49153 Y 2614 > Brick gluster2a-1:/data/brick/callrec 49153 Y 2645 > Brick gluster2b-1:/data/brick/callrec 49153 Y 4325 > NFS Server on localhost 2049 Y 2769 > Self-heal Daemon on localhost N/A Y 2789 > NFS Server on gluster2a-1 2049 Y 2857 > Self-heal Daemon on gluster2a-1 N/A Y 2814 > NFS Server on 88.151.41.100 2049 Y 6833 > Self-heal Daemon on 88.151.41.100 N/A Y 6824 > NFS Server on gluster2b-1 2049 Y 4428 > Self-heal Daemon on gluster2b-1 N/A Y 4387 > > Task Status of Volume callrec > ------------------------------------------------------------------------------ > There are no active volume tasks > > [root at gluster1b-1 ~]# > > > [root at gluster1b-1 ~]# gluster volume heal callrec info > Brick gluster1a-1.dns99.co.uk:/data/brick/callrec/ > /to_process - Possibly undergoing heal > > Number of entries: 1 > > Brick gluster1b-1.dns99.co.uk:/data/brick/callrec/ > Number of entries: 0 > > Brick gluster2a-1.dns99.co.uk:/data/brick/callrec/ > /to_process - Possibly undergoing heal > > Number of entries: 1 > > Brick gluster2b-1.dns99.co.uk:/data/brick/callrec/ > Number of entries: 0 > > [root at gluster1b-1 ~]# > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >
Atin Mukherjee
2015-Aug-10 10:21 UTC
[Gluster-users] volume not working after yum update - gluster 3.6.3
On 08/10/2015 03:35 PM, Kingsley wrote:> Sorry for the blind panic - restarting the volume seems to have fixed > it. > > But then my next question - why is this necessary? Surely it undermines > the whole point of a high availability system? > > Cheers, > Kingsley. > > On Mon, 2015-08-10 at 10:53 +0100, Kingsley wrote: >> Hi, >> >> We have a 4 way replicated volume using gluster 3.6.3 on CentOS 7. >> >> Over the weekend I did a yum update on each of the bricks in turn, but >> now when clients (using fuse mounts) try to access the volume, it hangs.What does mount log file say when you tried to access the volume? Can you attach the mount log file?>> Gluster itself wasn't updated (we've disabled that repo so that we keep >> to 3.6.3 for now). >> >> This was what I did: >> >> * on first brick, "yum update" >> * reboot brick >> * watch "gluster volume status" on another brick and wait for it >> to say all 4 bricks are online before proceeding to update the >> next brick >> >> I was expecting the clients might pause 30 seconds while they notice a >> brick is offline, but then recover. >> >> I've tried re-mounting clients, but that hasn't helped. >> >> I can't see much data in any of the log files. >> >> I've tried "gluster volume heal callrec" but it doesn't seem to have >> helped. >> >> What shall I do next? >> >> I've pasted some stuff below in case any of it helps. >> >> Cheers, >> Kingsley. >> >> [root at gluster1b-1 ~]# gluster volume info callrec >> >> Volume Name: callrec >> Type: Replicate >> Volume ID: a39830b7-eddb-4061-b381-39411274131a >> Status: Started >> Number of Bricks: 1 x 4 = 4 >> Transport-type: tcp >> Bricks: >> Brick1: gluster1a-1:/data/brick/callrec >> Brick2: gluster1b-1:/data/brick/callrec >> Brick3: gluster2a-1:/data/brick/callrec >> Brick4: gluster2b-1:/data/brick/callrec >> Options Reconfigured: >> performance.flush-behind: off >> [root at gluster1b-1 ~]# >> >> >> [root at gluster1b-1 ~]# gluster volume status callrec >> Status of volume: callrec >> Gluster process Port Online Pid >> ------------------------------------------------------------------------------ >> Brick gluster1a-1:/data/brick/callrec 49153 Y 6803 >> Brick gluster1b-1:/data/brick/callrec 49153 Y 2614 >> Brick gluster2a-1:/data/brick/callrec 49153 Y 2645 >> Brick gluster2b-1:/data/brick/callrec 49153 Y 4325 >> NFS Server on localhost 2049 Y 2769 >> Self-heal Daemon on localhost N/A Y 2789 >> NFS Server on gluster2a-1 2049 Y 2857 >> Self-heal Daemon on gluster2a-1 N/A Y 2814 >> NFS Server on 88.151.41.100 2049 Y 6833 >> Self-heal Daemon on 88.151.41.100 N/A Y 6824 >> NFS Server on gluster2b-1 2049 Y 4428 >> Self-heal Daemon on gluster2b-1 N/A Y 4387 >> >> Task Status of Volume callrec >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> [root at gluster1b-1 ~]# >> >> >> [root at gluster1b-1 ~]# gluster volume heal callrec info >> Brick gluster1a-1.dns99.co.uk:/data/brick/callrec/ >> /to_process - Possibly undergoing heal >> >> Number of entries: 1 >> >> Brick gluster1b-1.dns99.co.uk:/data/brick/callrec/ >> Number of entries: 0 >> >> Brick gluster2a-1.dns99.co.uk:/data/brick/callrec/ >> /to_process - Possibly undergoing heal >> >> Number of entries: 1 >> >> Brick gluster2b-1.dns99.co.uk:/data/brick/callrec/ >> Number of entries: 0 >> >> [root at gluster1b-1 ~]# >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-users >> > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >-- ~Atin
Kingsley
2015-Aug-10 13:49 UTC
[Gluster-users] volume not working after yum update - gluster 3.6.3
Further to this, the volume doesn't seem overly healthy. Any idea how I can get it back into a working state? Trying to access one particular directory on the clients just hangs. If I query heal info, that directory appears in the output as possibly undergoing heal (actual directory name changed as it's private info): [root at gluster1b-1 ~]# gluster volume heal callrec info Brick gluster1a-1.dns99.co.uk:/data/brick/callrec/ <gfid:164f888f-2049-49e6-ad26-c758ee091863> /recordings/834723/14391 - Possibly undergoing heal <gfid:e280b40c-d8b7-43c5-9da7-4737054d7a7f> <gfid:b1fbda4a-732f-4f5d-b5a1-8355d786073e> <gfid:edb74524-b4b7-4190-85e7-4aad002f6e7c> <gfid:9b8b8446-1e27-4113-93c2-6727b1f457eb> <gfid:650efeca-b45c-413b-acc3-f0a5853ccebd> Number of entries: 7 Brick gluster1b-1.dns99.co.uk:/data/brick/callrec/ Number of entries: 0 Brick gluster2a-1.dns99.co.uk:/data/brick/callrec/ <gfid:e280b40c-d8b7-43c5-9da7-4737054d7a7f> <gfid:164f888f-2049-49e6-ad26-c758ee091863> <gfid:650efeca-b45c-413b-acc3-f0a5853ccebd> <gfid:b1fbda4a-732f-4f5d-b5a1-8355d786073e> /recordings/834723/14391 - Possibly undergoing heal <gfid:edb74524-b4b7-4190-85e7-4aad002f6e7c> <gfid:9b8b8446-1e27-4113-93c2-6727b1f457eb> Number of entries: 7 Brick gluster2b-1.dns99.co.uk:/data/brick/callrec/ Number of entries: 0 If I query each brick directly for the number of files/directories within that, I get 1731 on gluster1a-1 and gluster2a-1, but 1737 on the other two, using this command: # find /data/brick/callrec/recordings/834723/14391 -print | wc -l Cheers, Kingsley. On Mon, 2015-08-10 at 11:05 +0100, Kingsley wrote:> Sorry for the blind panic - restarting the volume seems to have fixed > it. > > But then my next question - why is this necessary? Surely it undermines > the whole point of a high availability system? > > Cheers, > Kingsley. > > On Mon, 2015-08-10 at 10:53 +0100, Kingsley wrote: > > Hi, > > > > We have a 4 way replicated volume using gluster 3.6.3 on CentOS 7. > > > > Over the weekend I did a yum update on each of the bricks in turn, but > > now when clients (using fuse mounts) try to access the volume, it hangs. > > Gluster itself wasn't updated (we've disabled that repo so that we keep > > to 3.6.3 for now). > > > > This was what I did: > > > > * on first brick, "yum update" > > * reboot brick > > * watch "gluster volume status" on another brick and wait for it > > to say all 4 bricks are online before proceeding to update the > > next brick > > > > I was expecting the clients might pause 30 seconds while they notice a > > brick is offline, but then recover. > > > > I've tried re-mounting clients, but that hasn't helped. > > > > I can't see much data in any of the log files. > > > > I've tried "gluster volume heal callrec" but it doesn't seem to have > > helped. > > > > What shall I do next? > > > > I've pasted some stuff below in case any of it helps. > > > > Cheers, > > Kingsley. > > > > [root at gluster1b-1 ~]# gluster volume info callrec > > > > Volume Name: callrec > > Type: Replicate > > Volume ID: a39830b7-eddb-4061-b381-39411274131a > > Status: Started > > Number of Bricks: 1 x 4 = 4 > > Transport-type: tcp > > Bricks: > > Brick1: gluster1a-1:/data/brick/callrec > > Brick2: gluster1b-1:/data/brick/callrec > > Brick3: gluster2a-1:/data/brick/callrec > > Brick4: gluster2b-1:/data/brick/callrec > > Options Reconfigured: > > performance.flush-behind: off > > [root at gluster1b-1 ~]# > > > > > > [root at gluster1b-1 ~]# gluster volume status callrec > > Status of volume: callrec > > Gluster process Port Online Pid > > ------------------------------------------------------------------------------ > > Brick gluster1a-1:/data/brick/callrec 49153 Y 6803 > > Brick gluster1b-1:/data/brick/callrec 49153 Y 2614 > > Brick gluster2a-1:/data/brick/callrec 49153 Y 2645 > > Brick gluster2b-1:/data/brick/callrec 49153 Y 4325 > > NFS Server on localhost 2049 Y 2769 > > Self-heal Daemon on localhost N/A Y 2789 > > NFS Server on gluster2a-1 2049 Y 2857 > > Self-heal Daemon on gluster2a-1 N/A Y 2814 > > NFS Server on 88.151.41.100 2049 Y 6833 > > Self-heal Daemon on 88.151.41.100 N/A Y 6824 > > NFS Server on gluster2b-1 2049 Y 4428 > > Self-heal Daemon on gluster2b-1 N/A Y 4387 > > > > Task Status of Volume callrec > > ------------------------------------------------------------------------------ > > There are no active volume tasks > > > > [root at gluster1b-1 ~]# > > > > > > [root at gluster1b-1 ~]# gluster volume heal callrec info > > Brick gluster1a-1.dns99.co.uk:/data/brick/callrec/ > > /to_process - Possibly undergoing heal > > > > Number of entries: 1 > > > > Brick gluster1b-1.dns99.co.uk:/data/brick/callrec/ > > Number of entries: 0 > > > > Brick gluster2a-1.dns99.co.uk:/data/brick/callrec/ > > /to_process - Possibly undergoing heal > > > > Number of entries: 1 > > > > Brick gluster2b-1.dns99.co.uk:/data/brick/callrec/ > > Number of entries: 0 > > > > [root at gluster1b-1 ~]# > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-users > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >