ok, finally managed to get all servers, volumes etc runnung, but took a couple of restarts, cksum checks etc. One problem: a volume doesn't heal automatically or doesn't heal at all. gluster volume status Status of volume: workdata Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick glusterpub1:/gluster/md3/workdata 58832 0 Y 3436 Brick glusterpub2:/gluster/md3/workdata 59315 0 Y 1526 Brick glusterpub3:/gluster/md3/workdata 56917 0 Y 1952 Brick glusterpub1:/gluster/md4/workdata 59688 0 Y 3755 Brick glusterpub2:/gluster/md4/workdata 60271 0 Y 2271 Brick glusterpub3:/gluster/md4/workdata 49461 0 Y 2399 Brick glusterpub1:/gluster/md5/workdata 54651 0 Y 4208 Brick glusterpub2:/gluster/md5/workdata 49685 0 Y 2751 Brick glusterpub3:/gluster/md5/workdata 59202 0 Y 2803 Brick glusterpub1:/gluster/md6/workdata 55829 0 Y 4583 Brick glusterpub2:/gluster/md6/workdata 50455 0 Y 3296 Brick glusterpub3:/gluster/md6/workdata 50262 0 Y 3237 Brick glusterpub1:/gluster/md7/workdata 52238 0 Y 5014 Brick glusterpub2:/gluster/md7/workdata 52474 0 Y 3673 Brick glusterpub3:/gluster/md7/workdata 57966 0 Y 3653 Self-heal Daemon on localhost N/A N/A Y 4141 Self-heal Daemon on glusterpub1 N/A N/A Y 5570 Self-heal Daemon on glusterpub2 N/A N/A Y 4139 "gluster volume heal workdata info" lists a lot of files per brick. "gluster volume heal workdata statistics heal-count" shows thousands of files per brick. "gluster volume heal workdata enable" has no effect. gluster volume heal workdata full Launching heal operation to perform full self heal on volume workdata has been successful Use heal info commands to check status. -> not doing anything at all. And nothing happening on the 2 "good" servers in e.g. glustershd.log. Heal was working as expected on version 10.4, but here... silence. Someone has an idea? Best regards, Hubert Am Di., 16. Jan. 2024 um 13:44 Uhr schrieb Gilberto Ferreira <gilberto.nunes32 at gmail.com>:> > Ah! Indeed! You need to perform an upgrade in the clients as well. > > > > > > > > > Em ter., 16 de jan. de 2024 ?s 03:12, Hu Bert <revirii at googlemail.com> escreveu: >> >> morning to those still reading :-) >> >> i found this: https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them >> >> there's a paragraph about "peer rejected" with the same error message, >> telling me: "Update the cluster.op-version" - i had only updated the >> server nodes, but not the clients. So upgrading the cluster.op-version >> wasn't possible at this time. So... upgrading the clients to version >> 11.1 and then the op-version should solve the problem? >> >> >> Thx, >> Hubert >> >> Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb Hu Bert <revirii at googlemail.com>: >> > >> > Hi, >> > just upgraded some gluster servers from version 10.4 to version 11.1. >> > Debian bullseye & bookworm. When only installing the packages: good, >> > servers, volumes etc. work as expected. >> > >> > But one needs to test if the systems work after a daemon and/or server >> > restart. Well, did a reboot, and after that the rebooted/restarted >> > system is "out". Log message from working node: >> > >> > [2024-01-15 08:02:21.585694 +0000] I [MSGID: 106163] >> > [glusterd-handshake.c:1501:__glusterd_mgmt_hndsk_versions_ack] >> > 0-management: using the op-version 100000 >> > [2024-01-15 08:02:21.589601 +0000] I [MSGID: 106490] >> > [glusterd-handler.c:2546:__glusterd_handle_incoming_friend_req] >> > 0-glusterd: Received probe from uuid: >> > b71401c3-512a-47cb-ac18-473c4ba7776e >> > [2024-01-15 08:02:23.608349 +0000] E [MSGID: 106010] >> > [glusterd-utils.c:3824:glusterd_compare_friend_volume] 0-management: >> > Version of Cksums sourceimages differ. local cksum = 2204642525, >> > remote cksum = 1931483801 on peer gluster190 >> > [2024-01-15 08:02:23.608584 +0000] I [MSGID: 106493] >> > [glusterd-handler.c:3819:glusterd_xfer_friend_add_resp] 0-glusterd: >> > Responded to gluster190 (0), ret: 0, op_ret: -1 >> > [2024-01-15 08:02:23.613553 +0000] I [MSGID: 106493] >> > [glusterd-rpc-ops.c:467:__glusterd_friend_add_cbk] 0-glusterd: >> > Received RJT from uuid: b71401c3-512a-47cb-ac18-473c4ba7776e, host: >> > gluster190, port: 0 >> > >> > peer status from rebooted node: >> > >> > root at gluster190 ~ # gluster peer status >> > Number of Peers: 2 >> > >> > Hostname: gluster189 >> > Uuid: 50dc8288-aa49-4ea8-9c6c-9a9a926c67a7 >> > State: Peer Rejected (Connected) >> > >> > Hostname: gluster188 >> > Uuid: e15a33fe-e2f7-47cf-ac53-a3b34136555d >> > State: Peer Rejected (Connected) >> > >> > So the rebooted gluster190 is not accepted anymore. And thus does not >> > appear in "gluster volume status". I then followed this guide: >> > >> > https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/ >> > >> > Remove everything under /var/lib/glusterd/ (except glusterd.info) and >> > restart glusterd service etc. Data get copied from other nodes, >> > 'gluster peer status' is ok again - but the volume info is missing, >> > /var/lib/glusterd/vols is empty. When syncing this dir from another >> > node, the volume then is available again, heals start etc. >> > >> > Well, and just to be sure that everything's working as it should, >> > rebooted that node again - the rebooted node is kicked out again, and >> > you have to restart bringing it back again. >> > >> > Sry, but did i miss anything? Has someone experienced similar >> > problems? I'll probably downgrade to 10.4 again, that version was >> > working... >> > >> > >> > Thx, >> > Hubert >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://meet.google.com/cpu-eiue-hvk >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users
hm, i only see such messages in glustershd.log on the 2 good servers: [2024-01-17 12:18:48.912952 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-6: remote operation failed. [{path=<gfid:ee28b56c-e352-48f8-bbb5-dbf31babe073>}, {gfid=ee28b56c-e352-48f8-bbb5-dbf31 babe073}, {errno=2}, {error=No such file or directory}] [2024-01-17 12:18:48.913015 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-7: remote operation failed. [{path=<gfid:ee28b56c-e352-48f8-bbb5-dbf31babe073>}, {gfid=ee28b56c-e352-48f8-bbb5-dbf31 babe073}, {errno=2}, {error=No such file or directory}] [2024-01-17 12:19:09.450335 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-10: remote operation failed. [{path=<gfid:ea4a63e3-1470-40a5-8a7e-2a1061a8fcb0>}, {gfid=ea4a63e3-1470-40a5-8a7e-2a10 61a8fcb0}, {errno=2}, {error=No such file or directory}] [2024-01-17 12:19:09.450771 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-9: remote operation failed. [{path=<gfid:ea4a63e3-1470-40a5-8a7e-2a1061a8fcb0>}, {gfid=ea4a63e3-1470-40a5-8a7e-2a106 1a8fcb0}, {errno=2}, {error=No such file or directory}] not sure if this is important. Am Mi., 17. Jan. 2024 um 11:24 Uhr schrieb Hu Bert <revirii at googlemail.com>:> > ok, finally managed to get all servers, volumes etc runnung, but took > a couple of restarts, cksum checks etc. > > One problem: a volume doesn't heal automatically or doesn't heal at all. > > gluster volume status > Status of volume: workdata > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick glusterpub1:/gluster/md3/workdata 58832 0 Y 3436 > Brick glusterpub2:/gluster/md3/workdata 59315 0 Y 1526 > Brick glusterpub3:/gluster/md3/workdata 56917 0 Y 1952 > Brick glusterpub1:/gluster/md4/workdata 59688 0 Y 3755 > Brick glusterpub2:/gluster/md4/workdata 60271 0 Y 2271 > Brick glusterpub3:/gluster/md4/workdata 49461 0 Y 2399 > Brick glusterpub1:/gluster/md5/workdata 54651 0 Y 4208 > Brick glusterpub2:/gluster/md5/workdata 49685 0 Y 2751 > Brick glusterpub3:/gluster/md5/workdata 59202 0 Y 2803 > Brick glusterpub1:/gluster/md6/workdata 55829 0 Y 4583 > Brick glusterpub2:/gluster/md6/workdata 50455 0 Y 3296 > Brick glusterpub3:/gluster/md6/workdata 50262 0 Y 3237 > Brick glusterpub1:/gluster/md7/workdata 52238 0 Y 5014 > Brick glusterpub2:/gluster/md7/workdata 52474 0 Y 3673 > Brick glusterpub3:/gluster/md7/workdata 57966 0 Y 3653 > Self-heal Daemon on localhost N/A N/A Y 4141 > Self-heal Daemon on glusterpub1 N/A N/A Y 5570 > Self-heal Daemon on glusterpub2 N/A N/A Y 4139 > > "gluster volume heal workdata info" lists a lot of files per brick. > "gluster volume heal workdata statistics heal-count" shows thousands > of files per brick. > "gluster volume heal workdata enable" has no effect. > > gluster volume heal workdata full > Launching heal operation to perform full self heal on volume workdata > has been successful > Use heal info commands to check status. > > -> not doing anything at all. And nothing happening on the 2 "good" > servers in e.g. glustershd.log. Heal was working as expected on > version 10.4, but here... silence. Someone has an idea? > > > Best regards, > Hubert > > Am Di., 16. Jan. 2024 um 13:44 Uhr schrieb Gilberto Ferreira > <gilberto.nunes32 at gmail.com>: > > > > Ah! Indeed! You need to perform an upgrade in the clients as well. > > > > > > > > > > > > > > > > > > Em ter., 16 de jan. de 2024 ?s 03:12, Hu Bert <revirii at googlemail.com> escreveu: > >> > >> morning to those still reading :-) > >> > >> i found this: https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them > >> > >> there's a paragraph about "peer rejected" with the same error message, > >> telling me: "Update the cluster.op-version" - i had only updated the > >> server nodes, but not the clients. So upgrading the cluster.op-version > >> wasn't possible at this time. So... upgrading the clients to version > >> 11.1 and then the op-version should solve the problem? > >> > >> > >> Thx, > >> Hubert > >> > >> Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb Hu Bert <revirii at googlemail.com>: > >> > > >> > Hi, > >> > just upgraded some gluster servers from version 10.4 to version 11.1. > >> > Debian bullseye & bookworm. When only installing the packages: good, > >> > servers, volumes etc. work as expected. > >> > > >> > But one needs to test if the systems work after a daemon and/or server > >> > restart. Well, did a reboot, and after that the rebooted/restarted > >> > system is "out". Log message from working node: > >> > > >> > [2024-01-15 08:02:21.585694 +0000] I [MSGID: 106163] > >> > [glusterd-handshake.c:1501:__glusterd_mgmt_hndsk_versions_ack] > >> > 0-management: using the op-version 100000 > >> > [2024-01-15 08:02:21.589601 +0000] I [MSGID: 106490] > >> > [glusterd-handler.c:2546:__glusterd_handle_incoming_friend_req] > >> > 0-glusterd: Received probe from uuid: > >> > b71401c3-512a-47cb-ac18-473c4ba7776e > >> > [2024-01-15 08:02:23.608349 +0000] E [MSGID: 106010] > >> > [glusterd-utils.c:3824:glusterd_compare_friend_volume] 0-management: > >> > Version of Cksums sourceimages differ. local cksum = 2204642525, > >> > remote cksum = 1931483801 on peer gluster190 > >> > [2024-01-15 08:02:23.608584 +0000] I [MSGID: 106493] > >> > [glusterd-handler.c:3819:glusterd_xfer_friend_add_resp] 0-glusterd: > >> > Responded to gluster190 (0), ret: 0, op_ret: -1 > >> > [2024-01-15 08:02:23.613553 +0000] I [MSGID: 106493] > >> > [glusterd-rpc-ops.c:467:__glusterd_friend_add_cbk] 0-glusterd: > >> > Received RJT from uuid: b71401c3-512a-47cb-ac18-473c4ba7776e, host: > >> > gluster190, port: 0 > >> > > >> > peer status from rebooted node: > >> > > >> > root at gluster190 ~ # gluster peer status > >> > Number of Peers: 2 > >> > > >> > Hostname: gluster189 > >> > Uuid: 50dc8288-aa49-4ea8-9c6c-9a9a926c67a7 > >> > State: Peer Rejected (Connected) > >> > > >> > Hostname: gluster188 > >> > Uuid: e15a33fe-e2f7-47cf-ac53-a3b34136555d > >> > State: Peer Rejected (Connected) > >> > > >> > So the rebooted gluster190 is not accepted anymore. And thus does not > >> > appear in "gluster volume status". I then followed this guide: > >> > > >> > https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/ > >> > > >> > Remove everything under /var/lib/glusterd/ (except glusterd.info) and > >> > restart glusterd service etc. Data get copied from other nodes, > >> > 'gluster peer status' is ok again - but the volume info is missing, > >> > /var/lib/glusterd/vols is empty. When syncing this dir from another > >> > node, the volume then is available again, heals start etc. > >> > > >> > Well, and just to be sure that everything's working as it should, > >> > rebooted that node again - the rebooted node is kicked out again, and > >> > you have to restart bringing it back again. > >> > > >> > Sry, but did i miss anything? Has someone experienced similar > >> > problems? I'll probably downgrade to 10.4 again, that version was > >> > working... > >> > > >> > > >> > Thx, > >> > Hubert > >> ________ > >> > >> > >> > >> Community Meeting Calendar: > >> > >> Schedule - > >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > >> Bridge: https://meet.google.com/cpu-eiue-hvk > >> Gluster-users mailing list > >> Gluster-users at gluster.org > >> https://lists.gluster.org/mailman/listinfo/gluster-users
Good morning, heal still not running. Pending heals now sum up to 60K per brick. Heal was starting instantly e.g. after server reboot with version 10.4, but doesn't with version 11. What could be wrong? I only see these errors on one of the "good" servers in glustershd.log: [2024-01-18 06:08:57.328480 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-0: remote operation failed. [{path=<gfid:cb39a1e4-2a4c-4727-861d-3ed9ef00681b>}, {gfid=cb39a1e4-2a4c-4727-861d-3ed9e f00681b}, {errno=2}, {error=No such file or directory}] [2024-01-18 06:08:57.594051 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-1: remote operation failed. [{path=<gfid:3e9b178c-ae1f-4d85-ae47-fc539d94dd11>}, {gfid=3e9b178c-ae1f-4d85-ae47-fc539 d94dd11}, {errno=2}, {error=No such file or directory}] About 7K today. Any ideas? Someone? Best regards, Hubert Am Mi., 17. Jan. 2024 um 11:24 Uhr schrieb Hu Bert <revirii at googlemail.com>:> > ok, finally managed to get all servers, volumes etc runnung, but took > a couple of restarts, cksum checks etc. > > One problem: a volume doesn't heal automatically or doesn't heal at all. > > gluster volume status > Status of volume: workdata > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick glusterpub1:/gluster/md3/workdata 58832 0 Y 3436 > Brick glusterpub2:/gluster/md3/workdata 59315 0 Y 1526 > Brick glusterpub3:/gluster/md3/workdata 56917 0 Y 1952 > Brick glusterpub1:/gluster/md4/workdata 59688 0 Y 3755 > Brick glusterpub2:/gluster/md4/workdata 60271 0 Y 2271 > Brick glusterpub3:/gluster/md4/workdata 49461 0 Y 2399 > Brick glusterpub1:/gluster/md5/workdata 54651 0 Y 4208 > Brick glusterpub2:/gluster/md5/workdata 49685 0 Y 2751 > Brick glusterpub3:/gluster/md5/workdata 59202 0 Y 2803 > Brick glusterpub1:/gluster/md6/workdata 55829 0 Y 4583 > Brick glusterpub2:/gluster/md6/workdata 50455 0 Y 3296 > Brick glusterpub3:/gluster/md6/workdata 50262 0 Y 3237 > Brick glusterpub1:/gluster/md7/workdata 52238 0 Y 5014 > Brick glusterpub2:/gluster/md7/workdata 52474 0 Y 3673 > Brick glusterpub3:/gluster/md7/workdata 57966 0 Y 3653 > Self-heal Daemon on localhost N/A N/A Y 4141 > Self-heal Daemon on glusterpub1 N/A N/A Y 5570 > Self-heal Daemon on glusterpub2 N/A N/A Y 4139 > > "gluster volume heal workdata info" lists a lot of files per brick. > "gluster volume heal workdata statistics heal-count" shows thousands > of files per brick. > "gluster volume heal workdata enable" has no effect. > > gluster volume heal workdata full > Launching heal operation to perform full self heal on volume workdata > has been successful > Use heal info commands to check status. > > -> not doing anything at all. And nothing happening on the 2 "good" > servers in e.g. glustershd.log. Heal was working as expected on > version 10.4, but here... silence. Someone has an idea? > > > Best regards, > Hubert > > Am Di., 16. Jan. 2024 um 13:44 Uhr schrieb Gilberto Ferreira > <gilberto.nunes32 at gmail.com>: > > > > Ah! Indeed! You need to perform an upgrade in the clients as well. > > > > > > > > > > > > > > > > > > Em ter., 16 de jan. de 2024 ?s 03:12, Hu Bert <revirii at googlemail.com> escreveu: > >> > >> morning to those still reading :-) > >> > >> i found this: https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them > >> > >> there's a paragraph about "peer rejected" with the same error message, > >> telling me: "Update the cluster.op-version" - i had only updated the > >> server nodes, but not the clients. So upgrading the cluster.op-version > >> wasn't possible at this time. So... upgrading the clients to version > >> 11.1 and then the op-version should solve the problem? > >> > >> > >> Thx, > >> Hubert > >> > >> Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb Hu Bert <revirii at googlemail.com>: > >> > > >> > Hi, > >> > just upgraded some gluster servers from version 10.4 to version 11.1. > >> > Debian bullseye & bookworm. When only installing the packages: good, > >> > servers, volumes etc. work as expected. > >> > > >> > But one needs to test if the systems work after a daemon and/or server > >> > restart. Well, did a reboot, and after that the rebooted/restarted > >> > system is "out". Log message from working node: > >> > > >> > [2024-01-15 08:02:21.585694 +0000] I [MSGID: 106163] > >> > [glusterd-handshake.c:1501:__glusterd_mgmt_hndsk_versions_ack] > >> > 0-management: using the op-version 100000 > >> > [2024-01-15 08:02:21.589601 +0000] I [MSGID: 106490] > >> > [glusterd-handler.c:2546:__glusterd_handle_incoming_friend_req] > >> > 0-glusterd: Received probe from uuid: > >> > b71401c3-512a-47cb-ac18-473c4ba7776e > >> > [2024-01-15 08:02:23.608349 +0000] E [MSGID: 106010] > >> > [glusterd-utils.c:3824:glusterd_compare_friend_volume] 0-management: > >> > Version of Cksums sourceimages differ. local cksum = 2204642525, > >> > remote cksum = 1931483801 on peer gluster190 > >> > [2024-01-15 08:02:23.608584 +0000] I [MSGID: 106493] > >> > [glusterd-handler.c:3819:glusterd_xfer_friend_add_resp] 0-glusterd: > >> > Responded to gluster190 (0), ret: 0, op_ret: -1 > >> > [2024-01-15 08:02:23.613553 +0000] I [MSGID: 106493] > >> > [glusterd-rpc-ops.c:467:__glusterd_friend_add_cbk] 0-glusterd: > >> > Received RJT from uuid: b71401c3-512a-47cb-ac18-473c4ba7776e, host: > >> > gluster190, port: 0 > >> > > >> > peer status from rebooted node: > >> > > >> > root at gluster190 ~ # gluster peer status > >> > Number of Peers: 2 > >> > > >> > Hostname: gluster189 > >> > Uuid: 50dc8288-aa49-4ea8-9c6c-9a9a926c67a7 > >> > State: Peer Rejected (Connected) > >> > > >> > Hostname: gluster188 > >> > Uuid: e15a33fe-e2f7-47cf-ac53-a3b34136555d > >> > State: Peer Rejected (Connected) > >> > > >> > So the rebooted gluster190 is not accepted anymore. And thus does not > >> > appear in "gluster volume status". I then followed this guide: > >> > > >> > https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/ > >> > > >> > Remove everything under /var/lib/glusterd/ (except glusterd.info) and > >> > restart glusterd service etc. Data get copied from other nodes, > >> > 'gluster peer status' is ok again - but the volume info is missing, > >> > /var/lib/glusterd/vols is empty. When syncing this dir from another > >> > node, the volume then is available again, heals start etc. > >> > > >> > Well, and just to be sure that everything's working as it should, > >> > rebooted that node again - the rebooted node is kicked out again, and > >> > you have to restart bringing it back again. > >> > > >> > Sry, but did i miss anything? Has someone experienced similar > >> > problems? I'll probably downgrade to 10.4 again, that version was > >> > working... > >> > > >> > > >> > Thx, > >> > Hubert > >> ________ > >> > >> > >> > >> Community Meeting Calendar: > >> > >> Schedule - > >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > >> Bridge: https://meet.google.com/cpu-eiue-hvk > >> Gluster-users mailing list > >> Gluster-users at gluster.org > >> https://lists.gluster.org/mailman/listinfo/gluster-users