Andreas Schwibbe
2024-Sep-29 08:47 UTC
[Gluster-users] Growing cluster: peering worked, staging failed
Fellow gluster users, trying to extend a 3 node cluster that is serving me very reliably for a long time now. Cluster is serving two volumes: Volume Name: gv0 Type: Distributed-Replicate Volume ID: 9bafc4d2-d9b6-4b6d-a631-1cf42d1d2559 Status: Started Snapshot Count: 0 Number of Bricks: 6 x (2 + 1) = 18 Transport-type: tcp Volume Name: gv1 Type: Replicate Volume ID: 69a12600-6720-4e96-a269-931d72d4953e Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp adding peer to the cluster with gluster peer probe node4? works.? gluster peer status & gluster pool list? show all peers connected on every node. However, when doing gluster v status causes: Staging failed on node4. Error: Volume gv0 does not exist Staging failed on node4. Error: Volume gv1 does not exist on node4 I can see that /var/lib/glusterd/vols is empty! /var/log/glusterfs/glusterd.log shows: [2024-09-29 07:24:32.815997 +0000] E [MSGID: 106061] [glusterd- snapshot-utils.c:986:gd_import_new_brick_snap_details] 0-management: volume1.brick1.origin_path missing in payload [2024-09-29 07:24:32.816025 +0000] E [MSGID: 106061] [glusterd- snapshot-utils.c:1004:gd_import_new_brick_snap_details] 0-management: volume1.brick1.snap_type missing in payload [2024-09-29 07:24:32.924139 +0000] E [MSGID: 106061] [glusterd- snapshot-utils.c:1093:gd_import_volume_snap_details] 0-management: volume1.restored_from_snapname_id missing in payload for gv0 [2024-09-29 07:24:32.924158 +0000] E [MSGID: 106061] [glusterd- snapshot-utils.c:1104:gd_import_volume_snap_details] 0-management: volume1.restored_from_snapname missing in payload for gv0 [2024-09-29 07:24:32.924175 +0000] E [MSGID: 106061] [glusterd- snapshot-utils.c:1115:gd_import_volume_snap_details] 0-management: volume1.snap_plugin missing in payload for gv0 [2024-09-29 07:24:32.924365 +0000] E [MSGID: 106061] [glusterd- snapshot-utils.c:986:gd_import_new_brick_snap_details] 0-management: volume1.brick1.origin_path missing in payload [2024-09-29 07:24:32.924385 +0000] E [MSGID: 106061] [glusterd- snapshot-utils.c:1004:gd_import_new_brick_snap_details] 0-management: volume1.brick1.snap_type missing in payload [2024-09-29 07:24:43.516131 +0000] E [MSGID: 106048] [glusterd-op- sm.c:1814:glusterd_op_stage_status_volume] 0-management: Failed to get volinfo [{Volume=gv0}] [2024-09-29 07:24:43.516209 +0000] E [MSGID: 106301] [glusterd-op- sm.c:5870:glusterd_op_ac_stage_op] 0-management: Stage failed on operation 'Volume Status', Status : -1 [2024-09-29 07:24:43.524640 +0000] E [MSGID: 106048] [glusterd-op- sm.c:1814:glusterd_op_stage_status_volume] 0-management: Failed to get volinfo [{Volume=gv1}] [2024-09-29 07:24:43.524683 +0000] E [MSGID: 106301] [glusterd-op- sm.c:5870:glusterd_op_ac_stage_op] 0-management: Stage failed on operation 'Volume Status', Status : -1 The cluster is running Ubuntu 20.04 9.6-ubuntu1~focal1 The new node is running Ubuntu 24.04 11.1-4ubuntu0.1 I know there is version mismatch, according to General Upgrade procedure and v11 release notes I understood one can upgrade directly from 9 -> 11, which I am trying to do by adding one new node after one another and moving bricks over to new nodes step by step. Also there is kind of deadlock, too, as launchpad Gluster Ubuntu repo does not have an installation target for v9 on Ubuntu 24.04 and v10 does not run due to python errors when installed via launchpad repo. Checked firewall && apparmor && switches && cabling. Any ideas, comments towards the problem or approach? Many thanks. A. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20240929/04dfd8e7/attachment.html>
Andreas Schwibbe
2024-Oct-03 08:47 UTC
[Gluster-users] Growing cluster: peering worked, staging failed
Solving my own issue: Staging fails due to checksum error. Checksum error occurs, when you try to upgrade a cluster with option nfs.disable set as this is made optional in gluster 11 You have to upgrade the whole cluster to 11, then peer probe is successful as nfs.disable option is removed during upgrade on all nodes resulting in matching checksums. Unfortunately this means no online upgrade procedure ? Findings, errors etc. documented in this issue:?https://github.com/gluster/glusterfs/issues/4386 Cheers, A. Am Sonntag, dem 29.09.2024 um 10:47 +0200 schrieb Andreas Schwibbe:> Fellow gluster users, > > trying to extend a 3 node cluster that is serving me very reliably > for a long time now. > Cluster is serving two volumes: > > Volume Name: gv0 > Type: Distributed-Replicate > Volume ID: 9bafc4d2-d9b6-4b6d-a631-1cf42d1d2559 > Status: Started > Snapshot Count: 0 > Number of Bricks: 6 x (2 + 1) = 18 > Transport-type: tcp > > Volume Name: gv1 > Type: Replicate > Volume ID: 69a12600-6720-4e96-a269-931d72d4953e > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > > adding peer to the cluster with > gluster peer probe node4? > > works.? > gluster peer status & gluster pool list? > > show all peers connected on every node. > > However, > when doing > gluster v status > > causes: > Staging failed on node4. Error: Volume gv0 does not exist > Staging failed on node4. Error: Volume gv1 does not exist > > on node4 I can see that /var/lib/glusterd/vols is empty! > > /var/log/glusterfs/glusterd.log shows: > [2024-09-29 07:24:32.815997 +0000] E [MSGID: 106061] [glusterd- > snapshot-utils.c:986:gd_import_new_brick_snap_details] 0-management: > volume1.brick1.origin_path missing in payload > [2024-09-29 07:24:32.816025 +0000] E [MSGID: 106061] [glusterd- > snapshot-utils.c:1004:gd_import_new_brick_snap_details] 0-management: > volume1.brick1.snap_type missing in payload > [2024-09-29 07:24:32.924139 +0000] E [MSGID: 106061] [glusterd- > snapshot-utils.c:1093:gd_import_volume_snap_details] 0-management: > volume1.restored_from_snapname_id missing in payload for gv0 > [2024-09-29 07:24:32.924158 +0000] E [MSGID: 106061] [glusterd- > snapshot-utils.c:1104:gd_import_volume_snap_details] 0-management: > volume1.restored_from_snapname missing in payload for gv0 > [2024-09-29 07:24:32.924175 +0000] E [MSGID: 106061] [glusterd- > snapshot-utils.c:1115:gd_import_volume_snap_details] 0-management: > volume1.snap_plugin missing in payload for gv0 > [2024-09-29 07:24:32.924365 +0000] E [MSGID: 106061] [glusterd- > snapshot-utils.c:986:gd_import_new_brick_snap_details] 0-management: > volume1.brick1.origin_path missing in payload > [2024-09-29 07:24:32.924385 +0000] E [MSGID: 106061] [glusterd- > snapshot-utils.c:1004:gd_import_new_brick_snap_details] 0-management: > volume1.brick1.snap_type missing in payload > [2024-09-29 07:24:43.516131 +0000] E [MSGID: 106048] [glusterd-op- > sm.c:1814:glusterd_op_stage_status_volume] 0-management: Failed to > get volinfo [{Volume=gv0}] > [2024-09-29 07:24:43.516209 +0000] E [MSGID: 106301] [glusterd-op- > sm.c:5870:glusterd_op_ac_stage_op] 0-management: Stage failed on > operation 'Volume Status', Status : -1 > [2024-09-29 07:24:43.524640 +0000] E [MSGID: 106048] [glusterd-op- > sm.c:1814:glusterd_op_stage_status_volume] 0-management: Failed to > get volinfo [{Volume=gv1}] > [2024-09-29 07:24:43.524683 +0000] E [MSGID: 106301] [glusterd-op- > sm.c:5870:glusterd_op_ac_stage_op] 0-management: Stage failed on > operation 'Volume Status', Status : -1 > > The cluster is running Ubuntu 20.04 9.6-ubuntu1~focal1 > The new node is running Ubuntu 24.04 11.1-4ubuntu0.1 > > I know there is version mismatch, according to General Upgrade > procedure and v11 release notes I understood one can upgrade directly > from 9 -> 11, which I am trying to do by adding one new node after > one another and moving bricks over to new nodes step by step. > Also there is kind of deadlock, too, as launchpad Gluster Ubuntu repo > does not have an installation target for v9 on Ubuntu 24.04 and v10 > does not run due to python errors when installed via launchpad repo. > Checked firewall && apparmor && switches && cabling. > > Any ideas, comments towards the problem or approach? > > Many thanks. > A. > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20241003/fa68d5f7/attachment.html>