Hi, just upgraded some gluster servers from version 10.4 to version 11.1. Debian bullseye & bookworm. When only installing the packages: good, servers, volumes etc. work as expected. But one needs to test if the systems work after a daemon and/or server restart. Well, did a reboot, and after that the rebooted/restarted system is "out". Log message from working node: [2024-01-15 08:02:21.585694 +0000] I [MSGID: 106163] [glusterd-handshake.c:1501:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 100000 [2024-01-15 08:02:21.589601 +0000] I [MSGID: 106490] [glusterd-handler.c:2546:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: b71401c3-512a-47cb-ac18-473c4ba7776e [2024-01-15 08:02:23.608349 +0000] E [MSGID: 106010] [glusterd-utils.c:3824:glusterd_compare_friend_volume] 0-management: Version of Cksums sourceimages differ. local cksum = 2204642525, remote cksum = 1931483801 on peer gluster190 [2024-01-15 08:02:23.608584 +0000] I [MSGID: 106493] [glusterd-handler.c:3819:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to gluster190 (0), ret: 0, op_ret: -1 [2024-01-15 08:02:23.613553 +0000] I [MSGID: 106493] [glusterd-rpc-ops.c:467:__glusterd_friend_add_cbk] 0-glusterd: Received RJT from uuid: b71401c3-512a-47cb-ac18-473c4ba7776e, host: gluster190, port: 0 peer status from rebooted node: root at gluster190 ~ # gluster peer status Number of Peers: 2 Hostname: gluster189 Uuid: 50dc8288-aa49-4ea8-9c6c-9a9a926c67a7 State: Peer Rejected (Connected) Hostname: gluster188 Uuid: e15a33fe-e2f7-47cf-ac53-a3b34136555d State: Peer Rejected (Connected) So the rebooted gluster190 is not accepted anymore. And thus does not appear in "gluster volume status". I then followed this guide: https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/ Remove everything under /var/lib/glusterd/ (except glusterd.info) and restart glusterd service etc. Data get copied from other nodes, 'gluster peer status' is ok again - but the volume info is missing, /var/lib/glusterd/vols is empty. When syncing this dir from another node, the volume then is available again, heals start etc. Well, and just to be sure that everything's working as it should, rebooted that node again - the rebooted node is kicked out again, and you have to restart bringing it back again. Sry, but did i miss anything? Has someone experienced similar problems? I'll probably downgrade to 10.4 again, that version was working... Thx, Hubert
just downgraded one node to 10.4, did a reboot - same result: cksum error. i'm able to bring it back in again, but it that error persists when downgrading all servers... Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb Hu Bert <revirii at googlemail.com>:> > Hi, > just upgraded some gluster servers from version 10.4 to version 11.1. > Debian bullseye & bookworm. When only installing the packages: good, > servers, volumes etc. work as expected. > > But one needs to test if the systems work after a daemon and/or server > restart. Well, did a reboot, and after that the rebooted/restarted > system is "out". Log message from working node: > > [2024-01-15 08:02:21.585694 +0000] I [MSGID: 106163] > [glusterd-handshake.c:1501:__glusterd_mgmt_hndsk_versions_ack] > 0-management: using the op-version 100000 > [2024-01-15 08:02:21.589601 +0000] I [MSGID: 106490] > [glusterd-handler.c:2546:__glusterd_handle_incoming_friend_req] > 0-glusterd: Received probe from uuid: > b71401c3-512a-47cb-ac18-473c4ba7776e > [2024-01-15 08:02:23.608349 +0000] E [MSGID: 106010] > [glusterd-utils.c:3824:glusterd_compare_friend_volume] 0-management: > Version of Cksums sourceimages differ. local cksum = 2204642525, > remote cksum = 1931483801 on peer gluster190 > [2024-01-15 08:02:23.608584 +0000] I [MSGID: 106493] > [glusterd-handler.c:3819:glusterd_xfer_friend_add_resp] 0-glusterd: > Responded to gluster190 (0), ret: 0, op_ret: -1 > [2024-01-15 08:02:23.613553 +0000] I [MSGID: 106493] > [glusterd-rpc-ops.c:467:__glusterd_friend_add_cbk] 0-glusterd: > Received RJT from uuid: b71401c3-512a-47cb-ac18-473c4ba7776e, host: > gluster190, port: 0 > > peer status from rebooted node: > > root at gluster190 ~ # gluster peer status > Number of Peers: 2 > > Hostname: gluster189 > Uuid: 50dc8288-aa49-4ea8-9c6c-9a9a926c67a7 > State: Peer Rejected (Connected) > > Hostname: gluster188 > Uuid: e15a33fe-e2f7-47cf-ac53-a3b34136555d > State: Peer Rejected (Connected) > > So the rebooted gluster190 is not accepted anymore. And thus does not > appear in "gluster volume status". I then followed this guide: > > https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/ > > Remove everything under /var/lib/glusterd/ (except glusterd.info) and > restart glusterd service etc. Data get copied from other nodes, > 'gluster peer status' is ok again - but the volume info is missing, > /var/lib/glusterd/vols is empty. When syncing this dir from another > node, the volume then is available again, heals start etc. > > Well, and just to be sure that everything's working as it should, > rebooted that node again - the rebooted node is kicked out again, and > you have to restart bringing it back again. > > Sry, but did i miss anything? Has someone experienced similar > problems? I'll probably downgrade to 10.4 again, that version was > working... > > > Thx, > Hubert
morning to those still reading :-) i found this: https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them there's a paragraph about "peer rejected" with the same error message, telling me: "Update the cluster.op-version" - i had only updated the server nodes, but not the clients. So upgrading the cluster.op-version wasn't possible at this time. So... upgrading the clients to version 11.1 and then the op-version should solve the problem? Thx, Hubert Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb Hu Bert <revirii at googlemail.com>:> > Hi, > just upgraded some gluster servers from version 10.4 to version 11.1. > Debian bullseye & bookworm. When only installing the packages: good, > servers, volumes etc. work as expected. > > But one needs to test if the systems work after a daemon and/or server > restart. Well, did a reboot, and after that the rebooted/restarted > system is "out". Log message from working node: > > [2024-01-15 08:02:21.585694 +0000] I [MSGID: 106163] > [glusterd-handshake.c:1501:__glusterd_mgmt_hndsk_versions_ack] > 0-management: using the op-version 100000 > [2024-01-15 08:02:21.589601 +0000] I [MSGID: 106490] > [glusterd-handler.c:2546:__glusterd_handle_incoming_friend_req] > 0-glusterd: Received probe from uuid: > b71401c3-512a-47cb-ac18-473c4ba7776e > [2024-01-15 08:02:23.608349 +0000] E [MSGID: 106010] > [glusterd-utils.c:3824:glusterd_compare_friend_volume] 0-management: > Version of Cksums sourceimages differ. local cksum = 2204642525, > remote cksum = 1931483801 on peer gluster190 > [2024-01-15 08:02:23.608584 +0000] I [MSGID: 106493] > [glusterd-handler.c:3819:glusterd_xfer_friend_add_resp] 0-glusterd: > Responded to gluster190 (0), ret: 0, op_ret: -1 > [2024-01-15 08:02:23.613553 +0000] I [MSGID: 106493] > [glusterd-rpc-ops.c:467:__glusterd_friend_add_cbk] 0-glusterd: > Received RJT from uuid: b71401c3-512a-47cb-ac18-473c4ba7776e, host: > gluster190, port: 0 > > peer status from rebooted node: > > root at gluster190 ~ # gluster peer status > Number of Peers: 2 > > Hostname: gluster189 > Uuid: 50dc8288-aa49-4ea8-9c6c-9a9a926c67a7 > State: Peer Rejected (Connected) > > Hostname: gluster188 > Uuid: e15a33fe-e2f7-47cf-ac53-a3b34136555d > State: Peer Rejected (Connected) > > So the rebooted gluster190 is not accepted anymore. And thus does not > appear in "gluster volume status". I then followed this guide: > > https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/ > > Remove everything under /var/lib/glusterd/ (except glusterd.info) and > restart glusterd service etc. Data get copied from other nodes, > 'gluster peer status' is ok again - but the volume info is missing, > /var/lib/glusterd/vols is empty. When syncing this dir from another > node, the volume then is available again, heals start etc. > > Well, and just to be sure that everything's working as it should, > rebooted that node again - the rebooted node is kicked out again, and > you have to restart bringing it back again. > > Sry, but did i miss anything? Has someone experienced similar > problems? I'll probably downgrade to 10.4 again, that version was > working... > > > Thx, > Hubert