Jeevan Patnaik
2018-Nov-25 10:55 UTC
[Gluster-users] Can glusterd be restarted running on all nodes at once while clients are mounted?
Hi, I have different issues: I have restarted glusterd service on my 72 nodes almost parallelly with ansible while the gluster NFS clients are in mounted state After that many of the gluster peers went to rejected state. In logs, I see msg id 106010 stating that checksum doesn't match. I'm confused which checksum is that and how is it changed after I restart. I restarted because gluster volume status commands gives timeout. I have tiering enabled on the volume and was trying to detach. And that too never completed. The status shows only in progress even the tiered volume contains only a few 100 8MB filese I created for testing. my overall experience with gluster tiering is really bad :( Besides, what's the best way to restore old state if something goes wrong? Till now, I have been using no volfile at all.. I only use volume status commands to configure my cluster. Do I need to use a volfile inorder to restore something? Gluster version is 3.12.15 I have checked the op Version on all nodes and they all are same. Regards Jeevan? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181125/137f83ce/attachment.html>
Jeevan Patnaik
2018-Nov-25 11:40 UTC
[Gluster-users] Can glusterd be restarted running on all nodes at once while clients are mounted?
Hi, I understand something now: I think glusterd should not be restarted on all nodes at once. And if this true, can anyone provide technical explanation of how it effects the checksum? And it seems to fix the rejected hosts, I need to clear the /var/lib/glusterd except gluster.info and start glusterd and peer probe again. Regards, Jeevan? On Sun, Nov 25, 2018, 4:25 PM Jeevan Patnaik <g1patnaik at gmail.com wrote:> Hi, > > I have different issues: > > I have restarted glusterd service on my 72 nodes almost parallelly with > ansible while the gluster NFS clients are in mounted state > > After that many of the gluster peers went to rejected state. In logs, I > see msg id 106010 stating that checksum doesn't match. > > I'm confused which checksum is that and how is it changed after I restart. > > I restarted because gluster volume status commands gives timeout. I have > tiering enabled on the volume and was trying to detach. And that too never > completed. The status shows only in progress even the tiered volume > contains only a few 100 8MB filese I created for testing. > > my overall experience with gluster tiering is really bad :( > > Besides, what's the best way to restore old state if something goes wrong? > Till now, I have been using no volfile at all.. I only use volume status > commands to configure my cluster. Do I need to use a volfile inorder to > restore something? > > Gluster version is 3.12.15 > I have checked the op Version on all nodes and they all are same. > > > Regards > Jeevan? > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181125/f96743b2/attachment.html>
Jeevan Patnaik
2018-Nov-25 12:22 UTC
[Gluster-users] Can glusterd be restarted running on all nodes at once while clients are mounted?
Ah..I am able to differentiate the hosts which are commonly rejected. It's the hosts that aren't serving any bricks. Is it a bad idea to keep a host that's not serving any bricks in pool? Don't they in sync with the other hosts? Regarding my previous assumption that all nodes shoudo be restarted ate once, I guess it's okay. Now, I just have to understand the issue with the tiering. Logs are not helping. Regards, Jeevan. On Sun, Nov 25, 2018, 4:25 PM Jeevan Patnaik <g1patnaik at gmail.com wrote:> Hi, > > I have different issues: > > I have restarted glusterd service on my 72 nodes almost parallelly with > ansible while the gluster NFS clients are in mounted state > > After that many of the gluster peers went to rejected state. In logs, I > see msg id 106010 stating that checksum doesn't match. > > I'm confused which checksum is that and how is it changed after I restart. > > I restarted because gluster volume status commands gives timeout. I have > tiering enabled on the volume and was trying to detach. And that too never > completed. The status shows only in progress even the tiered volume > contains only a few 100 8MB filese I created for testing. > > my overall experience with gluster tiering is really bad :( > > Besides, what's the best way to restore old state if something goes wrong? > Till now, I have been using no volfile at all.. I only use volume status > commands to configure my cluster. Do I need to use a volfile inorder to > restore something? > > Gluster version is 3.12.15 > I have checked the op Version on all nodes and they all are same. > > > Regards > Jeevan? > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181125/8c4e171a/attachment.html>