Dear all Last week I posted a query about a problem I had with a machine that had failed but the underlying hard disk with the gluster brick was good. I?ve made some progress in restoring. I now have the problem with my new restored machine where it becomes its own peer, which then breaks everything. 1. Gluster daemons are off on all peers, content of /var/lib/glusterd/peers looks good. 2. I start the gluster daemons on all peers. All looks good. 3. For about 2 minutes, there?s no obvious problem ? if I do a gluster peer status on any machine it looks good, if I do a gluster volume status A01 on any machine it looks good. 4. Then at some point, the /var/lib/glusterd/peers file of the new, restored machine gets an entry for itself and things start breaking. A typical error message is the understandable : Unable to get lock for uuid: 4fb930f7-554e-462a-9204-4592591feeb8, lock held by: 4fb930f7-554e-462a-9204-4592591feeb8 5. This is repeatable ? if I stop daemons, remove the offending entry in /var/lib/glusterd/peer, and restart, the same behavior occurs ? all good for a minute or two and then something magically puts something in /var/lib/glusterd/peers In a previous step in restoring my machine, I had a different error of mismatching cksums and what I did then may be the cause of the problem. In searching the list archives I found someone with a similar cksum problem, and the proposed solution was to copy the /var/lib/glusterd/vols/ from another of the peers to the new machine. This may not be the issue but this is the only thing I think I did that was unconventional. I am running version 3.7.5-19 on Scientific Linux 6.8 If anyone can suggest a way forward I would be grateful Many thanks Scott <table width="100%" border="0" cellspacing="0" cellpadding="0" style="width:100%;"> <tr> <td align="left" style="text-align:justify;"><font face="arial,sans-serif" size="1" color="#999999"><span style="font-size:11px;">This communication is intended for the addressee only. It is confidential. If you have received this communication in error, please notify us immediately and destroy the original message. You may not copy or disseminate this communication without the permission of the University. Only authorised signatories are competent to enter into agreements on behalf of the University and recipients are thus advised that the content of this message may not be legally binding on the University and may contain the personal views and opinions of the author, which are not necessarily the views and opinions of The University of the Witwatersrand, Johannesburg. All agreements between the University and outsiders are subject to South African Law unless the University agrees in writing to the contrary. </span></font></td> </tr> </table
On Fri, Feb 17, 2017 at 11:19 AM, Scott Hazelhurst < Scott.Hazelhurst at wits.ac.za> wrote:> > Dear all > > Last week I posted a query about a problem I had with a machine that had > failed but the underlying hard disk with the gluster brick was good. I?ve > made some progress in restoring. I now have the problem with my new > restored machine where it becomes its own peer, which then breaks > everything. > > 1. Gluster daemons are off on all peers, content of > /var/lib/glusterd/peers looks good. > 2. I start the gluster daemons on all peers. All looks good. > 3. For about 2 minutes, there?s no obvious problem ? if I do a gluster > peer status on any machine it looks good, if I do a gluster volume status > A01 on any machine it looks good. > 4. Then at some point, the /var/lib/glusterd/peers file of the new, > restored machine gets an entry for itself and things start breaking. A > typical error message is the understandable > > : Unable to get lock for uuid: 4fb930f7-554e-462a-9204-4592591feeb8, lock > held by: 4fb930f7-554e-462a-9204-4592591feeb8 > > 5. This is repeatable ? if I stop daemons, remove the offending entry in > /var/lib/glusterd/peer, and restart, the same behavior occurs ? all good > for a minute or two and then something magically puts something in > /var/lib/glusterd/peers >I'd need few more details here: 1. output of gluster peer status 2. output of cat /var/lib/glusterd/glusterd.info & cat /var/lib/glusterd/peers/* from all the nodes> In a previous step in restoring my machine, I had a different error of > mismatching cksums and what I did then may be the cause of the problem. In > searching the list archives I found someone with a similar cksum problem, > and the proposed solution was to copy the /var/lib/glusterd/vols/ from > another of the peers to the new machine. This may not be the issue but this > is the only thing I think I did that was unconventional. > > I am running version 3.7.5-19 on Scientific Linux 6.8 > > If anyone can suggest a way forward I would be grateful > > Many thanks > > Scott > > > <table width="100%" border="0" cellspacing="0" cellpadding="0" > style="width:100%;"> > <tr> > <td align="left" style="text-align:justify;"><font > face="arial,sans-serif" size="1" color="#999999"><span > style="font-size:11px;">This communication is intended for the addressee > only. It is confidential. If you have received this communication in error, > please notify us immediately and destroy the original message. You may not > copy or disseminate this communication without the permission of the > University. Only authorised signatories are competent to enter into > agreements on behalf of the University and recipients are thus advised that > the content of this message may not be legally binding on the University > and may contain the personal views and opinions of the author, which are > not necessarily the views and opinions of The University of the > Witwatersrand, Johannesburg. All agreements between the University and > outsiders are subject to South African Law unless the University agrees in > writing to the contrary. </span></font></td> > </tr> > </table > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users-- ~ Atin (atinm) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170217/8bc443da/attachment.html>
Does your repaired server have the correct uuid /var/lib/glusterd/glusterd.info? On February 16, 2017 9:49:56 PM PST, Scott Hazelhurst <Scott.Hazelhurst at wits.ac.za> wrote:> >Dear all > >Last week I posted a query about a problem I had with a machine that >had failed but the underlying hard disk with the gluster brick was >good. I?ve made some progress in restoring. I now have the problem with >my new restored machine where it becomes its own peer, which then >breaks everything. > >1. Gluster daemons are off on all peers, content of >/var/lib/glusterd/peers looks good. >2. I start the gluster daemons on all peers. All looks good. >3. For about 2 minutes, there?s no obvious problem ? if I do a gluster >peer status on any machine it looks good, if I do a gluster volume >status A01 on any machine it looks good. >4. Then at some point, the /var/lib/glusterd/peers file of the new, >restored machine gets an entry for itself and things start breaking. A >typical error message is the understandable > >: Unable to get lock for uuid: 4fb930f7-554e-462a-9204-4592591feeb8, >lock held by: 4fb930f7-554e-462a-9204-4592591feeb8 > >5. This is repeatable ? if I stop daemons, remove the offending entry >in /var/lib/glusterd/peer, and restart, the same behavior occurs ? all >good for a minute or two and then something magically puts something in >/var/lib/glusterd/peers > >In a previous step in restoring my machine, I had a different error of >mismatching cksums and what I did then may be the cause of the problem. >In searching the list archives I found someone with a similar cksum >problem, and the proposed solution was to copy the >/var/lib/glusterd/vols/ from another of the peers to the new machine. >This may not be the issue but this is the only thing I think I did that >was unconventional. > >I am running version 3.7.5-19 on Scientific Linux 6.8 > >If anyone can suggest a way forward I would be grateful > >Many thanks > >Scott > > ><table width="100%" border="0" cellspacing="0" cellpadding="0" >style="width:100%;"> ><tr> ><td align="left" style="text-align:justify;"><font >face="arial,sans-serif" size="1" color="#999999"><span >style="font-size:11px;">This communication is intended for the >addressee only. It is confidential. If you have received this >communication in error, please notify us immediately and destroy the >original message. You may not copy or disseminate this communication >without the permission of the University. Only authorised signatories >are competent to enter into agreements on behalf of the University and >recipients are thus advised that the content of this message may not be >legally binding on the University and may contain the personal views >and opinions of the author, which are not necessarily the views and >opinions of The University of the Witwatersrand, Johannesburg. All >agreements between the University and outsiders are subject to South >African Law unless the University agrees in writing to the contrary. ></span></font></td> ></tr> ></table >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://lists.gluster.org/mailman/listinfo/gluster-users-- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170217/886d3e8b/attachment.html>