thr3ads.net - Gluster users - [Gluster-users] Unable to upgrade nodes because of cksums mismatch [Dec 2021]

If this information is useful, please help other people find it:
Share via:

Michael Böhm

2021-Dec-27 12:55 UTC

[Gluster-users] Unable to upgrade nodes because of cksums mismatch

Hey guys,

i have a problem upgrading our nodes from 8.3 to 10.0 - i just upgraded the
first node and run into "the cksums mismatch" problem. On the upgraded
v10
node the checksums for all volumes are different than on the other v8
nodes. That leads to the node starting in a peer rejected state. I can only
resolve this by following the actions supposed here:
https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Administrator%20Guide/Resolving%20Peer%20Rejected/
(stopping glusterd, deleting /var/lib/glusterd/* (except glusterd.info),
start glusterd, probe a v8 peer, restart glusterd again)

The cluster seems healthy again, self-healing is started and everything
looks fine - only the newly created cksums are still different than on the
other nodes. That means this healthy state only lasts till i reboot the
node - where it all begins from the start - the nodes comes up as peer
rejected.

Now i'v read about the problem here:
https://github.com/gluster/glusterfs/issues/1332 (even though that
describes the problem should only occur when upgrading from earlier than v7)
or also here on the mailing list:
https://lists.gluster.org/pipermail/gluster-users/2021-November/039679.html
(i think i have the same problem, but unfortunately no solution given here)

Solutions seem to require upgrading all nodes and the problem should be
resolved when finally upgrading op.version - but i dont' think this
approach can be done online, and there's not really a way for me to do this
offline.

Why is this happening now and not when i upgraded from pre7 to 7? All my
nodes are 8.3 and op.version is 8000.

One thing i might have done "wrong" - as i upgraded to v8 i didn't
set
"gluster volume set <volname> fips-mode-rchecksum on" on the
volumes, i
think i just overlooked it in the docs. I have this option only set on 2
volumes i created after upgrading to v8. But even on those 2 the cksums
differ, so i guess it wouldn' help alot if i set the option on all other
volumes?

I really don't know what to do now, i kinda understand the problem but
don't know why this is happening on a overall v8 cluster. I can't take
all
9 nodes down, upgrade all to v10 and rely on "it's all good" with
the final
upgrade of op.version.

Can someone point me in a safe direction?

Regards

Mika
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20211227/bce0d481/attachment.html>

Nikhil Ladha

2021-Dec-27 13:03 UTC

head link

[Gluster-users] Unable to upgrade nodes because of cksums mismatch

Hi Michael

I think you are hitting a similar issue like this one
https://github.com/gluster/glusterfs/issues/3066.
If so, the fix for the same is under review and could be available in the
next release.

--
Thanks and Regards,
*NiKHIL LADHA*


On Mon, Dec 27, 2021 at 6:25 PM Michael B?hm <dudleyperkins at gmail.com>
wrote:
> Hey guys,
>
> i have a problem upgrading our nodes from 8.3 to 10.0 - i just upgraded
> the first node and run into "the cksums mismatch" problem. On the
upgraded
> v10 node the checksums for all volumes are different than on the other v8
> nodes. That leads to the node starting in a peer rejected state. I can only
> resolve this by following the actions supposed here:
>
>
https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Administrator%20Guide/Resolving%20Peer%20Rejected/
> (stopping glusterd, deleting /var/lib/glusterd/* (except glusterd.info),
> start glusterd, probe a v8 peer, restart glusterd again)
>
> The cluster seems healthy again, self-healing is started and everything
> looks fine - only the newly created cksums are still different than on the
> other nodes. That means this healthy state only lasts till i reboot the
> node - where it all begins from the start - the nodes comes up as peer
> rejected.
>
> Now i'v read about the problem here:
> https://github.com/gluster/glusterfs/issues/1332 (even though that
> describes the problem should only occur when upgrading from earlier than
v7)
> or also here on the mailing list:
> https://lists.gluster.org/pipermail/gluster-users/2021-November/039679.html
> (i think i have the same problem, but unfortunately no solution given here)
>
> Solutions seem to require upgrading all nodes and the problem should be
> resolved when finally upgrading op.version - but i dont' think this
> approach can be done online, and there's not really a way for me to do
this
> offline.
>
> Why is this happening now and not when i upgraded from pre7 to 7? All my
> nodes are 8.3 and op.version is 8000.
>
> One thing i might have done "wrong" - as i upgraded to v8 i
didn't set
> "gluster volume set <volname> fips-mode-rchecksum on" on
the volumes, i
> think i just overlooked it in the docs. I have this option only set on 2
> volumes i created after upgrading to v8. But even on those 2 the cksums
> differ, so i guess it wouldn' help alot if i set the option on all
other
> volumes?
>
> I really don't know what to do now, i kinda understand the problem but
> don't know why this is happening on a overall v8 cluster. I can't
take all
> 9 nodes down, upgrade all to v10 and rely on "it's all good"
with the final
> upgrade of op.version.
>
> Can someone point me in a safe direction?
>
> Regards
>
> Mika
>
>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20211227/1f0f4412/attachment.html>

Gluster users - Dec 2021 - Unable to upgrade nodes because of cksums mismatch

[Gluster-users] Unable to upgrade nodes because of cksums mismatch

[Gluster-users] Unable to upgrade nodes because of cksums mismatch