thr3ads.net - Gluster users - [Gluster-users] Upgrade 10.4 -> 11.1 making problems [Jan 2024]

If this information is useful, please help other people find it:
Share via:

Hu Bert

2024-Jan-15 08:16 UTC

[Gluster-users] Upgrade 10.4 -> 11.1 making problems

Hi,
just upgraded some gluster servers from version 10.4 to version 11.1.
Debian bullseye & bookworm. When only installing the packages: good,
servers, volumes etc. work as expected.

But one needs to test if the systems work after a daemon and/or server
restart. Well, did a reboot, and after that the rebooted/restarted
system is "out". Log message from working node:

[2024-01-15 08:02:21.585694 +0000] I [MSGID: 106163]
[glusterd-handshake.c:1501:__glusterd_mgmt_hndsk_versions_ack]
0-management: using the op-version 100000
[2024-01-15 08:02:21.589601 +0000] I [MSGID: 106490]
[glusterd-handler.c:2546:__glusterd_handle_incoming_friend_req]
0-glusterd: Received probe from uuid:
b71401c3-512a-47cb-ac18-473c4ba7776e
[2024-01-15 08:02:23.608349 +0000] E [MSGID: 106010]
[glusterd-utils.c:3824:glusterd_compare_friend_volume] 0-management:
Version of Cksums sourceimages differ. local cksum = 2204642525,
remote cksum = 1931483801 on peer gluster190
[2024-01-15 08:02:23.608584 +0000] I [MSGID: 106493]
[glusterd-handler.c:3819:glusterd_xfer_friend_add_resp] 0-glusterd:
Responded to gluster190 (0), ret: 0, op_ret: -1
[2024-01-15 08:02:23.613553 +0000] I [MSGID: 106493]
[glusterd-rpc-ops.c:467:__glusterd_friend_add_cbk] 0-glusterd:
Received RJT from uuid: b71401c3-512a-47cb-ac18-473c4ba7776e, host:
gluster190, port: 0

peer status from rebooted node:

root at gluster190 ~ # gluster peer status
Number of Peers: 2

Hostname: gluster189
Uuid: 50dc8288-aa49-4ea8-9c6c-9a9a926c67a7
State: Peer Rejected (Connected)

Hostname: gluster188
Uuid: e15a33fe-e2f7-47cf-ac53-a3b34136555d
State: Peer Rejected (Connected)

So the rebooted gluster190 is not accepted anymore. And thus does not
appear in "gluster volume status". I then followed this guide:

https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/

Remove everything under /var/lib/glusterd/ (except glusterd.info) and
restart glusterd service etc. Data get copied from other nodes,
'gluster peer status' is ok again - but the volume info is missing,
/var/lib/glusterd/vols is empty. When syncing this dir from another
node, the volume then is available again, heals start etc.

Well, and just to be sure that everything's working as it should,
rebooted that node again - the rebooted node is kicked out again, and
you have to restart bringing it back again.

Sry, but did i miss anything? Has someone experienced similar
problems? I'll probably downgrade to 10.4 again, that version was
working...


Thx,
Hubert

Hu Bert

2024-Jan-15 08:46 UTC

head link

[Gluster-users] Upgrade 10.4 -> 11.1 making problems

just downgraded one node to 10.4, did a reboot - same result: cksum
error. i'm able to bring it back in again, but it that error persists
when downgrading all servers...

Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb Hu Bert <revirii at
googlemail.com>:>
> Hi,
> just upgraded some gluster servers from version 10.4 to version 11.1.
> Debian bullseye & bookworm. When only installing the packages: good,
> servers, volumes etc. work as expected.
>
> But one needs to test if the systems work after a daemon and/or server
> restart. Well, did a reboot, and after that the rebooted/restarted
> system is "out". Log message from working node:
>
> [2024-01-15 08:02:21.585694 +0000] I [MSGID: 106163]
> [glusterd-handshake.c:1501:__glusterd_mgmt_hndsk_versions_ack]
> 0-management: using the op-version 100000
> [2024-01-15 08:02:21.589601 +0000] I [MSGID: 106490]
> [glusterd-handler.c:2546:__glusterd_handle_incoming_friend_req]
> 0-glusterd: Received probe from uuid:
> b71401c3-512a-47cb-ac18-473c4ba7776e
> [2024-01-15 08:02:23.608349 +0000] E [MSGID: 106010]
> [glusterd-utils.c:3824:glusterd_compare_friend_volume] 0-management:
> Version of Cksums sourceimages differ. local cksum = 2204642525,
> remote cksum = 1931483801 on peer gluster190
> [2024-01-15 08:02:23.608584 +0000] I [MSGID: 106493]
> [glusterd-handler.c:3819:glusterd_xfer_friend_add_resp] 0-glusterd:
> Responded to gluster190 (0), ret: 0, op_ret: -1
> [2024-01-15 08:02:23.613553 +0000] I [MSGID: 106493]
> [glusterd-rpc-ops.c:467:__glusterd_friend_add_cbk] 0-glusterd:
> Received RJT from uuid: b71401c3-512a-47cb-ac18-473c4ba7776e, host:
> gluster190, port: 0
>
> peer status from rebooted node:
>
> root at gluster190 ~ # gluster peer status
> Number of Peers: 2
>
> Hostname: gluster189
> Uuid: 50dc8288-aa49-4ea8-9c6c-9a9a926c67a7
> State: Peer Rejected (Connected)
>
> Hostname: gluster188
> Uuid: e15a33fe-e2f7-47cf-ac53-a3b34136555d
> State: Peer Rejected (Connected)
>
> So the rebooted gluster190 is not accepted anymore. And thus does not
> appear in "gluster volume status". I then followed this guide:
>
>
https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/
>
> Remove everything under /var/lib/glusterd/ (except glusterd.info) and
> restart glusterd service etc. Data get copied from other nodes,
> 'gluster peer status' is ok again - but the volume info is missing,
> /var/lib/glusterd/vols is empty. When syncing this dir from another
> node, the volume then is available again, heals start etc.
>
> Well, and just to be sure that everything's working as it should,
> rebooted that node again - the rebooted node is kicked out again, and
> you have to restart bringing it back again.
>
> Sry, but did i miss anything? Has someone experienced similar
> problems? I'll probably downgrade to 10.4 again, that version was
> working...
>
>
> Thx,
> Hubert

Hu Bert

2024-Jan-16 06:11 UTC

head link

[Gluster-users] Upgrade 10.4 -> 11.1 making problems

morning to those still reading :-)

i found this:
https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them

there's a paragraph about "peer rejected" with the same error
message,
telling me: "Update the cluster.op-version" - i had only updated the
server nodes, but not the clients. So upgrading the cluster.op-version
wasn't possible at this time. So... upgrading the clients to version
11.1 and then the op-version should solve the problem?


Thx,
Hubert

Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb Hu Bert <revirii at
googlemail.com>:>
> Hi,
> just upgraded some gluster servers from version 10.4 to version 11.1.
> Debian bullseye & bookworm. When only installing the packages: good,
> servers, volumes etc. work as expected.
>
> But one needs to test if the systems work after a daemon and/or server
> restart. Well, did a reboot, and after that the rebooted/restarted
> system is "out". Log message from working node:
>
> [2024-01-15 08:02:21.585694 +0000] I [MSGID: 106163]
> [glusterd-handshake.c:1501:__glusterd_mgmt_hndsk_versions_ack]
> 0-management: using the op-version 100000
> [2024-01-15 08:02:21.589601 +0000] I [MSGID: 106490]
> [glusterd-handler.c:2546:__glusterd_handle_incoming_friend_req]
> 0-glusterd: Received probe from uuid:
> b71401c3-512a-47cb-ac18-473c4ba7776e
> [2024-01-15 08:02:23.608349 +0000] E [MSGID: 106010]
> [glusterd-utils.c:3824:glusterd_compare_friend_volume] 0-management:
> Version of Cksums sourceimages differ. local cksum = 2204642525,
> remote cksum = 1931483801 on peer gluster190
> [2024-01-15 08:02:23.608584 +0000] I [MSGID: 106493]
> [glusterd-handler.c:3819:glusterd_xfer_friend_add_resp] 0-glusterd:
> Responded to gluster190 (0), ret: 0, op_ret: -1
> [2024-01-15 08:02:23.613553 +0000] I [MSGID: 106493]
> [glusterd-rpc-ops.c:467:__glusterd_friend_add_cbk] 0-glusterd:
> Received RJT from uuid: b71401c3-512a-47cb-ac18-473c4ba7776e, host:
> gluster190, port: 0
>
> peer status from rebooted node:
>
> root at gluster190 ~ # gluster peer status
> Number of Peers: 2
>
> Hostname: gluster189
> Uuid: 50dc8288-aa49-4ea8-9c6c-9a9a926c67a7
> State: Peer Rejected (Connected)
>
> Hostname: gluster188
> Uuid: e15a33fe-e2f7-47cf-ac53-a3b34136555d
> State: Peer Rejected (Connected)
>
> So the rebooted gluster190 is not accepted anymore. And thus does not
> appear in "gluster volume status". I then followed this guide:
>
>
https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/
>
> Remove everything under /var/lib/glusterd/ (except glusterd.info) and
> restart glusterd service etc. Data get copied from other nodes,
> 'gluster peer status' is ok again - but the volume info is missing,
> /var/lib/glusterd/vols is empty. When syncing this dir from another
> node, the volume then is available again, heals start etc.
>
> Well, and just to be sure that everything's working as it should,
> rebooted that node again - the rebooted node is kicked out again, and
> you have to restart bringing it back again.
>
> Sry, but did i miss anything? Has someone experienced similar
> problems? I'll probably downgrade to 10.4 again, that version was
> working...
>
>
> Thx,
> Hubert

Gluster users - Jan 2024 - Upgrade 10.4 -> 11.1 making problems

[Gluster-users] Upgrade 10.4 -> 11.1 making problems

[Gluster-users] Upgrade 10.4 -> 11.1 making problems

[Gluster-users] Upgrade 10.4 -> 11.1 making problems