Pavel Riha
2015-Apr-15 15:05 UTC
[Gluster-users] how to check/fix underlaying partition error?
Thank you for your reply. but btw what is the right way to do this? stoping the glusterd service does not stop the glustefsd daemons itself https://bugzilla.redhat.com/show_bug.cgi?id=988946 and I have more volumes running, but only one with this problem. I haven't found any official way how to stop the process, so I just KILLed them. It worked.. partiton repaired, seems ok for now. But how to run the brick again?? I didn't save the cmdline showed in ps, but it was crazy. As I see the other running .. there are crazy numbers (uuid, socked, port) and the port (for ex) is not the same as on the other server... so I restarted the glusterd service .. nothing happend .. I was hopeless .. but after a while I recognized, that the process is running, so maybe the glusterd started it after a while there should be some way to stop or at least start one brick Pavel On 15.4.2015 11:59, Sander Zijlstra wrote:> Hi Pavel, > > you can simply stop the glusterd service and run the fsck, it's similar to rebooting a server which is part of a replicated volume. If all is ok before you can simply take down one of the two and once it comes back online it will be heal each file which hasn't been copied allready. > > Do take care of any client which has the volume mounted using the server you take down; that will loose connection also. > > Met vriendelijke groet / kind regards, > > Sander Zijlstra > > Linux Engineer | SURFsara | Science Park 140 | 1098XG Amsterdam | > +31 (0)6 43 99 12 47 | sander.zijlstra at surfsara.nl | www.surfsara.nl | > > ----- Original Message ----- > From: "Pavel Riha" <pavel.riha at trilogic.cz> > To: gluster-users at gluster.org > Sent: Wednesday, 15 April, 2015 10:28:50 > Subject: [Gluster-users] how to check/fix underlaying partition error? > > Hi guys, > > I have replicated glusterfs (v3.4.2) on two server and I found logs > filled by IO error on one server only. But in /var/log/messages is no hw > error, only XFS error, so I gues the filesystem could be corrupted > > My question is, how to stop or pause this brick and run fsck ? > From the replicate feature I'm expecting no need to stop the gluster > volume (there are some xen VM running) > > what is the right way to do it? with the later re-adding and fast > rebuild/sync in mind.. > > thank for tips > > Pavel > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >
Jiri Hoogeveen
2015-Apr-16 06:49 UTC
[Gluster-users] how to check/fix underlaying partition error?
Hi Pavel, killing the brick proces, is the way to go. This way, all other bricks on that server, will keep working. After you replace/fix the disk, A restart of the glusterd proces should me should be enough, to get the brick back online. (self-healing scan, can take some IO) Do you have some logs, about the brick that would not start? Btw, IO error on XFS? Did you lose some files from brick/.glusterfs, which can explain why the brick will not start up. Grtz, Jiri> On 15 Apr 2015, at 17:05, Pavel Riha <pavel.riha at trilogic.cz> wrote: > > Thank you for your reply. > > but btw what is the right way to do this? > stoping the glusterd service does not stop the glustefsd daemons itself > https://bugzilla.redhat.com/show_bug.cgi?id=988946 > > and I have more volumes running, but only one with this problem. > I haven't found any official way how to stop the process, so I just KILLed them. > It worked.. partiton repaired, seems ok for now. > > But how to run the brick again?? > I didn't save the cmdline showed in ps, but it was crazy. As I see the other running .. there are crazy numbers (uuid, socked, port) > and the port (for ex) is not the same as on the other server... > > so I restarted the glusterd service .. nothing happend .. I was hopeless > .. but after a while I recognized, that the process is running, so maybe the glusterd started it after a while > > there should be some way to stop or at least start one brick > > > > Pavel > > > > On 15.4.2015 11:59, Sander Zijlstra wrote: >> Hi Pavel, >> >> you can simply stop the glusterd service and run the fsck, it's similar to rebooting a server which is part of a replicated volume. If all is ok before you can simply take down one of the two and once it comes back online it will be heal each file which hasn't been copied allready. >> >> Do take care of any client which has the volume mounted using the server you take down; that will loose connection also. >> >> Met vriendelijke groet / kind regards, >> >> Sander Zijlstra >> >> Linux Engineer | SURFsara | Science Park 140 | 1098XG Amsterdam | >> +31 (0)6 43 99 12 47 | sander.zijlstra at surfsara.nl | www.surfsara.nl | >> >> ----- Original Message ----- >> From: "Pavel Riha" <pavel.riha at trilogic.cz> >> To: gluster-users at gluster.org >> Sent: Wednesday, 15 April, 2015 10:28:50 >> Subject: [Gluster-users] how to check/fix underlaying partition error? >> >> Hi guys, >> >> I have replicated glusterfs (v3.4.2) on two server and I found logs >> filled by IO error on one server only. But in /var/log/messages is no hw >> error, only XFS error, so I gues the filesystem could be corrupted >> >> My question is, how to stop or pause this brick and run fsck ? >> From the replicate feature I'm expecting no need to stop the gluster >> volume (there are some xen VM running) >> >> what is the right way to do it? with the later re-adding and fast >> rebuild/sync in mind.. >> >> thank for tips >> >> Pavel >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-users >> > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users