Michael Böhm
2021-Dec-27 12:55 UTC
[Gluster-users] Unable to upgrade nodes because of cksums mismatch
Hey guys, i have a problem upgrading our nodes from 8.3 to 10.0 - i just upgraded the first node and run into "the cksums mismatch" problem. On the upgraded v10 node the checksums for all volumes are different than on the other v8 nodes. That leads to the node starting in a peer rejected state. I can only resolve this by following the actions supposed here: https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Administrator%20Guide/Resolving%20Peer%20Rejected/ (stopping glusterd, deleting /var/lib/glusterd/* (except glusterd.info), start glusterd, probe a v8 peer, restart glusterd again) The cluster seems healthy again, self-healing is started and everything looks fine - only the newly created cksums are still different than on the other nodes. That means this healthy state only lasts till i reboot the node - where it all begins from the start - the nodes comes up as peer rejected. Now i'v read about the problem here: https://github.com/gluster/glusterfs/issues/1332 (even though that describes the problem should only occur when upgrading from earlier than v7) or also here on the mailing list: https://lists.gluster.org/pipermail/gluster-users/2021-November/039679.html (i think i have the same problem, but unfortunately no solution given here) Solutions seem to require upgrading all nodes and the problem should be resolved when finally upgrading op.version - but i dont' think this approach can be done online, and there's not really a way for me to do this offline. Why is this happening now and not when i upgraded from pre7 to 7? All my nodes are 8.3 and op.version is 8000. One thing i might have done "wrong" - as i upgraded to v8 i didn't set "gluster volume set <volname> fips-mode-rchecksum on" on the volumes, i think i just overlooked it in the docs. I have this option only set on 2 volumes i created after upgrading to v8. But even on those 2 the cksums differ, so i guess it wouldn' help alot if i set the option on all other volumes? I really don't know what to do now, i kinda understand the problem but don't know why this is happening on a overall v8 cluster. I can't take all 9 nodes down, upgrade all to v10 and rely on "it's all good" with the final upgrade of op.version. Can someone point me in a safe direction? Regards Mika -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20211227/bce0d481/attachment.html>
Nikhil Ladha
2021-Dec-27 13:03 UTC
[Gluster-users] Unable to upgrade nodes because of cksums mismatch
Hi Michael I think you are hitting a similar issue like this one https://github.com/gluster/glusterfs/issues/3066. If so, the fix for the same is under review and could be available in the next release. -- Thanks and Regards, *NiKHIL LADHA* On Mon, Dec 27, 2021 at 6:25 PM Michael B?hm <dudleyperkins at gmail.com> wrote:> Hey guys, > > i have a problem upgrading our nodes from 8.3 to 10.0 - i just upgraded > the first node and run into "the cksums mismatch" problem. On the upgraded > v10 node the checksums for all volumes are different than on the other v8 > nodes. That leads to the node starting in a peer rejected state. I can only > resolve this by following the actions supposed here: > > https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Administrator%20Guide/Resolving%20Peer%20Rejected/ > (stopping glusterd, deleting /var/lib/glusterd/* (except glusterd.info), > start glusterd, probe a v8 peer, restart glusterd again) > > The cluster seems healthy again, self-healing is started and everything > looks fine - only the newly created cksums are still different than on the > other nodes. That means this healthy state only lasts till i reboot the > node - where it all begins from the start - the nodes comes up as peer > rejected. > > Now i'v read about the problem here: > https://github.com/gluster/glusterfs/issues/1332 (even though that > describes the problem should only occur when upgrading from earlier than v7) > or also here on the mailing list: > https://lists.gluster.org/pipermail/gluster-users/2021-November/039679.html > (i think i have the same problem, but unfortunately no solution given here) > > Solutions seem to require upgrading all nodes and the problem should be > resolved when finally upgrading op.version - but i dont' think this > approach can be done online, and there's not really a way for me to do this > offline. > > Why is this happening now and not when i upgraded from pre7 to 7? All my > nodes are 8.3 and op.version is 8000. > > One thing i might have done "wrong" - as i upgraded to v8 i didn't set > "gluster volume set <volname> fips-mode-rchecksum on" on the volumes, i > think i just overlooked it in the docs. I have this option only set on 2 > volumes i created after upgrading to v8. But even on those 2 the cksums > differ, so i guess it wouldn' help alot if i set the option on all other > volumes? > > I really don't know what to do now, i kinda understand the problem but > don't know why this is happening on a overall v8 cluster. I can't take all > 9 nodes down, upgrade all to v10 and rely on "it's all good" with the final > upgrade of op.version. > > Can someone point me in a safe direction? > > Regards > > Mika > > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20211227/1f0f4412/attachment.html>