Arif Ali
2016-Jun-15 10:27 UTC
[Gluster-users] [RESOLVED] issues recovering machine in gluster
On 15 June 2016 at 08:55, Arif Ali <mail at arif-ali.co.uk> wrote:> > On 15 June 2016 at 08:09, Atin Mukherjee <amukherj at redhat.com> wrote: > >> >> >> On 06/15/2016 12:14 PM, Arif Ali wrote: >> > >> > On 15 June 2016 at 06:48, Atin Mukherjee <amukherj at redhat.com >> > <mailto:amukherj at redhat.com>> wrote: >> > >> > >> > >> > On 06/15/2016 11:06 AM, Gandalf Corvotempesta wrote: >> > > Il 15 giu 2016 07:09, "Atin Mukherjee" <amukherj at redhat.com >> <mailto:amukherj at redhat.com> >> > > <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com>>> ha >> scritto: >> > >> To get rid of this situation you'd need to stop all the running >> glusterd >> > >> instances and go into /var/lib/glusterd/peers folder on all the >> nodes >> > >> and manually correct the UUID file names and their content if >> required. >> > > >> > > If i understood properly the only way to fix this is by bringing >> the >> > > whole cluster down? "you'd need to stop all the running glusterd >> instances" >> > > >> > > I hope you are referring to all instances on the failed node... >> > >> > No, since the configuration are synced across all the nodes, any >> > incorrect data gets replicated through out. So in this case to be >> on the >> > safer side and validate the correctness all glusterd instances on >> *all* >> > the nodes should be brought down. Having said that, this doesn't >> impact >> > I/O as the management path is different than I/O. >> > >> > >> > As a sanity, one of the things I did last night, was to reboot the whole >> > gluster system, when I had downtime arranged. I thought this is >> > something would be asked, as I had seen similar requests on the mailing >> > list previously >> > >> > Unfortunately though, it didn't fix the problem. >> >> Only reboot is not going to solve the problem. You'd need to correct the >> configuration as I explained earlier in this thread. If it doesn't >> please send the me the content of /var/lib/glusterd/peers/ & >> /var/lib/glusterd/glusterd.info file from all the nodes where glusterd >> instances are running. I'll take a look and correct them and send it >> back to you. >> > > Thanks Atin, > > Apologies, I missed your mail, as I was travelling > > I have checked the relevant files you have mentioned, and they seem to > look correct to me, but I have attached it for sanity, maybe you can spot > something, that I have not seen >I have been discussing the issue with Atin on IRC, and we have resolved the problem. Thanks Atin, it was much appreciated For the purpose of this list. I had the UUID file matching the host in /var/lib/glusterd/peers for the host itself. This was not required. Once I removed the UUID based on the node where glusterd was running, the node was able function correctly -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160615/2fd1ce9f/attachment.html>