I?ve never had such situation and I don?t recall someone sharing something similar. Most probably it?s easier to remove the node from the TSP and re-add it.Of course , test the case in VMs just to validate that it?s possible to add a mode to a cluster with snapshots. I have a vague feeling that you will need to delete all snapshots. Best Regards,Strahil Nikolov? On Thursday, August 10, 2023, 4:36 AM, Sebastian Neustein <sebastian.neustein at arc-aachen.de> wrote: Hi Due to an outage of one node, after bringing it up again, the node has some orphaned snapshosts, which are already deleted on the other nodes. How can I delete these orphaned snapshots? Trying the normal way produceses these errors: [2023-08-08 19:34:03.667109 +0000] E [MSGID: 106115] [glusterd-mgmt.c:118:gd_mgmt_v3_collate_errors] 0-management: Pre Validation failed on B742. Please check log file for details. [2023-08-08 19:34:03.667184 +0000] E [MSGID: 106115] [glusterd-mgmt.c:118:gd_mgmt_v3_collate_errors] 0-management: Pre Validation failed on B741. Please check log file for details. [2023-08-08 19:34:03.667210 +0000] E [MSGID: 106121] [glusterd-mgmt.c:1083:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed on peers [2023-08-08 19:34:03.667236 +0000] E [MSGID: 106121] [glusterd-mgmt.c:2875:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Pre Validation Failed Even worse: I followed read hat gluster snapshot trouble guide and deleted one of those directories defining a snapshot. Now I receive this on the cli: run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]: [2023-08-09 08:59:41.107243 +0000] M [MSGID: 113075] [posix-helpers.c:2161:posix_health_check_thread_proc] 0-e4dcd4166538414c849fa91b0b3934d7-posix: health-check failed, going down run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]: [2023-08-09 08:59:41.107243 +0000] M [MSGID: 113075] [posix-helpers.c:2161:posix_health_check_thread_proc] 0-e4dcd4166538414c849fa91b0b3934d7-posix: health-check failed, going down run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]: [2023-08-09 08:59:41.107292 +0000] M [MSGID: 113075] [posix-helpers.c:2179:posix_health_check_thread_proc] 0-e4dcd4166538414c849fa91b0b3934d7-posix: still alive! -> SIGTERM run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]: [2023-08-09 08:59:41.107292 +0000] M [MSGID: 113075] [posix-helpers.c:2179:posix_health_check_thread_proc] 0-e4dcd4166538414c849fa91b0b3934d7-posix: still alive! -> SIGTERM What are my options? - is there an easy way to remove all those snapshots? - or would it be easier to remove and rejoin the node to the gluster cluster? Thank you for any help! Seb -- Sebastian Neustein Airport Research Center GmbH Bismarckstra?e 61 52066 Aachen Germany Phone: +49 241 16843-23 Fax: +49 241 16843-19 e-mail: sebastian.neustein at arc-aachen.de Website: http://www.airport-consultants.com Register Court: Amtsgericht Aachen HRB 7313 Ust-Id-No.: DE196450052 Managing Director: Dipl.-Ing. Tom Alexander Heuer ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20230810/1d6da8c4/attachment.html>
Thank you for your reply! I will set up an test environment and see what will happen. Am 10.08.2023 um 20:33 schrieb Strahil Nikolov:> I?ve never had such situation and I don?t recall someone sharing > something similar. > > Most probably it?s easier to remove the node from the TSP and re-add it. > Of course , test the case in VMs just to validate that it?s possible > to add a mode to a cluster with snapshots. > > I have a vague feeling that you will need to delete all snapshots. > > Best Regards, > Strahil Nikolov > > On Thursday, August 10, 2023, 4:36 AM, Sebastian Neustein > <sebastian.neustein at arc-aachen.de> wrote: > > Hi > > Due to an outage of one node, after bringing it up again, the node > has some orphaned snapshosts, which are already deleted on the > other nodes. > > How can I delete these orphaned snapshots? Trying the normal way > produceses these errors: > |[2023-08-08 19:34:03.667109 +0000] E [MSGID: 106115] > [glusterd-mgmt.c:118:gd_mgmt_v3_collate_errors] 0-management: Pre > Validation failed on B742. Please check log file for details.| > |[2023-08-08 19:34:03.667184 +0000] E [MSGID: 106115] > [glusterd-mgmt.c:118:gd_mgmt_v3_collate_errors] 0-management: Pre > Validation failed on B741. Please check log file for details.| > |[2023-08-08 19:34:03.667210 +0000] E [MSGID: 106121] > [glusterd-mgmt.c:1083:glusterd_mgmt_v3_pre_validate] 0-management: > Pre Validation failed on peers| > |[2023-08-08 19:34:03.667236 +0000] E [MSGID: 106121] > [glusterd-mgmt.c:2875:glusterd_mgmt_v3_initiate_snap_phases] > 0-management: Pre Validation Failed| > > Even worse: I followed read hat gluster snapshot trouble guide > <https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/administration_guide/troubleshooting_snapshots> > and deleted one of those directories defining a snapshot. Now I > receive this on the cli: > |run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]: > [2023-08-09 08:59:41.107243 +0000] M [MSGID: 113075] > [posix-helpers.c:2161:posix_health_check_thread_proc] > 0-e4dcd4166538414c849fa91b0b3934d7-posix: health-check failed, > going down| > |run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]: > [2023-08-09 08:59:41.107243 +0000] M [MSGID: 113075] > [posix-helpers.c:2161:posix_health_check_thread_proc] > 0-e4dcd4166538414c849fa91b0b3934d7-posix: health-check failed, > going down| > |run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]: > [2023-08-09 08:59:41.107292 +0000] M [MSGID: 113075] > [posix-helpers.c:2179:posix_health_check_thread_proc] > 0-e4dcd4166538414c849fa91b0b3934d7-posix: still alive! -> SIGTERM| > |run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]: > [2023-08-09 08:59:41.107292 +0000] M [MSGID: 113075] > [posix-helpers.c:2179:posix_health_check_thread_proc] > 0-e4dcd4166538414c849fa91b0b3934d7-posix: still alive! -> SIGTERM| > > What are my options? > - is there an easy way to remove all those snapshots? > - or would it be easier to remove and rejoin the node to the > gluster cluster? > > Thank you for any help! > > Seb > > -- > Sebastian Neustein > > Airport Research Center GmbH > Bismarckstra?e 61 > 52066 Aachen > Germany > > Phone: +49 241 16843-23 > Fax: +49 241 16843-19 > e-mail:sebastian.neustein at arc-aachen.de <mailto:sebastian.neustein at arc-aachen.de> > Website:http://www.airport-consultants.com <http://www.airport-consultants.com> > > Register Court: Amtsgericht Aachen HRB 7313 > Ust-Id-No.: DE196450052 > > Managing Director: > Dipl.-Ing. Tom Alexander Heuer > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >-- Sebastian Neustein Airport Research Center GmbH Bismarckstra?e 61 52066 Aachen Germany Phone: +49 241 16843-23 Fax: +49 241 16843-19 e-mail:sebastian.neustein at arc-aachen.de Website:http://www.airport-consultants.com Register Court: Amtsgericht Aachen HRB 7313 Ust-Id-No.: DE196450052 Managing Director: Dipl.-Ing. Tom Alexander Heuer -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20230811/137dfcd0/attachment.html>
Strahil Nikolov:> I?ve never had such situation and I don?t recall someone sharing > something similar.That's strange, it is really easy to reproduce. This is from a fresh test environment: summary: - There is one snapshot present. - On one node glusterd is stopped. - During the stop, one snapshot is deleted. - The node is brought up again - On that node there is an orphaned snapshot detailed version: # on node 1: root at gl1:~# cat /etc/debian_version 11.7 root at gl1:~# gluster --version glusterfs 10.4 root at gl1:~# gluster volume info Volume Name: glvol_samba Type: Replicate Volume ID: 91cb059e-10e4-4439-92ea-001065652749 Status: Started Snapshot Count: 1 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: gl1:/data/glusterfs/glvol_samba/brick0/brick Brick2: gl2:/data/glusterfs/glvol_samba/brick0/brick Brick3: gl3:/data/glusterfs/glvol_samba/brick0/brick Options Reconfigured: cluster.granular-entry-heal: on storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off features.barrier: disable root at gl1:~# gluster snapshot list snaps_GMT-2023.08.15-13.05.28 # on node 3: root at gl3:~# systemctl stop glusterd.service # on node 1: root at gl1:~# gluster snapshot deactivate snaps_GMT-2023.08.15-13.05.28 Deactivating snap will make its data inaccessible. Do you want to continue? (y/n) y Snapshot deactivate: snaps_GMT-2023.08.15-13.05.28: Snap deactivated successfully root at gl1:~# gluster snapshot delete snaps_GMT-2023.08.15-13.05.28 Deleting snap will erase all the information about the snap. Do you still want to continue? (y/n) y snapshot delete: snaps_GMT-2023.08.15-13.05.28: snap removed successfully root at gl1:~# gluster snapshot list No snapshots present # on node 3: root at gl3:~# systemctl start glusterd.service root at gl3:~# gluster snapshot list snaps_GMT-2023.08.15-13.05.28 root at gl3:~# gluster snapshot deactivate snaps_GMT-2023.08.15-13.05.28 Deactivating snap will make its data inaccessible. Do you want to continue? (y/n) y snapshot deactivate: failed: Pre Validation failed on gl1.ad.arc.de. Snapshot (snaps_GMT-2023.08.15-13.05.28) does not exist. Pre Validation failed on gl2. Snapshot (snaps_GMT-2023.08.15-13.05.28) does not exist. Snapshot command failed root at gl3:~# lvs -a ? LV???????????????????????????????? VG??????? Attr?????? LSize Pool????? Origin??? Data%? Meta%? Move Log Cpy%Sync Convert ? 669cbc14fa7542acafb2995666284583_0 vg_brick0 Vwi-aotz-- 15,00g tp_brick0 lv_brick0 0,08 ? lv_brick0????????????????????????? vg_brick0 Vwi-aotz-- 15,00g tp_brick0?????????? 0,08 ? [lvol0_pmspare]??????????????????? vg_brick0 ewi------- 20,00m ? tp_brick0????????????????????????? vg_brick0 twi-aotz-- 18,00g???????????????????? 0,12?? 10,57 ? [tp_brick0_tdata]????????????????? vg_brick0 Twi-ao---- 18,00g ? [tp_brick0_tmeta]????????????????? vg_brick0 ewi-ao---- 20,00m Would it be dangerous to just delete following items on node 3 while gluster is down: - the orphaned directories in /var/lib/glusterd/snaps/ - the orphaned lvm, here 669cbc14fa7542acafb2995666284583_0 Or is there a self-heal command? Regards Sebastian Am 10.08.2023 um 20:33 schrieb Strahil Nikolov:> I?ve never had such situation and I don?t recall someone sharing > something similar. > > Most probably it?s easier to remove the node from the TSP and re-add it. > Of course , test the case in VMs just to validate that it?s possible > to add a mode to a cluster with snapshots. > > I have a vague feeling that you will need to delete all snapshots. > > Best Regards, > Strahil Nikolov > > On Thursday, August 10, 2023, 4:36 AM, Sebastian Neustein > <sebastian.neustein at arc-aachen.de> wrote: > > Hi > > Due to an outage of one node, after bringing it up again, the node > has some orphaned snapshosts, which are already deleted on the > other nodes. > > How can I delete these orphaned snapshots? Trying the normal way > produceses these errors: > |[2023-08-08 19:34:03.667109 +0000] E [MSGID: 106115] > [glusterd-mgmt.c:118:gd_mgmt_v3_collate_errors] 0-management: Pre > Validation failed on B742. Please check log file for details.| > |[2023-08-08 19:34:03.667184 +0000] E [MSGID: 106115] > [glusterd-mgmt.c:118:gd_mgmt_v3_collate_errors] 0-management: Pre > Validation failed on B741. Please check log file for details.| > |[2023-08-08 19:34:03.667210 +0000] E [MSGID: 106121] > [glusterd-mgmt.c:1083:glusterd_mgmt_v3_pre_validate] 0-management: > Pre Validation failed on peers| > |[2023-08-08 19:34:03.667236 +0000] E [MSGID: 106121] > [glusterd-mgmt.c:2875:glusterd_mgmt_v3_initiate_snap_phases] > 0-management: Pre Validation Failed| > > Even worse: I followed read hat gluster snapshot trouble guide > <https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/administration_guide/troubleshooting_snapshots> > and deleted one of those directories defining a snapshot. Now I > receive this on the cli: > |run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]: > [2023-08-09 08:59:41.107243 +0000] M [MSGID: 113075] > [posix-helpers.c:2161:posix_health_check_thread_proc] > 0-e4dcd4166538414c849fa91b0b3934d7-posix: health-check failed, > going down| > |run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]: > [2023-08-09 08:59:41.107243 +0000] M [MSGID: 113075] > [posix-helpers.c:2161:posix_health_check_thread_proc] > 0-e4dcd4166538414c849fa91b0b3934d7-posix: health-check failed, > going down| > |run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]: > [2023-08-09 08:59:41.107292 +0000] M [MSGID: 113075] > [posix-helpers.c:2179:posix_health_check_thread_proc] > 0-e4dcd4166538414c849fa91b0b3934d7-posix: still alive! -> SIGTERM| > |run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]: > [2023-08-09 08:59:41.107292 +0000] M [MSGID: 113075] > [posix-helpers.c:2179:posix_health_check_thread_proc] > 0-e4dcd4166538414c849fa91b0b3934d7-posix: still alive! -> SIGTERM| > > What are my options? > - is there an easy way to remove all those snapshots? > - or would it be easier to remove and rejoin the node to the > gluster cluster? > > Thank you for any help! > > Seb > > -- > Sebastian Neustein > > Airport Research Center GmbH > Bismarckstra?e 61 > 52066 Aachen > Germany > > Phone: +49 241 16843-23 > Fax: +49 241 16843-19 > e-mail:sebastian.neustein at arc-aachen.de <mailto:sebastian.neustein at arc-aachen.de> > Website:http://www.airport-consultants.com <http://www.airport-consultants.com> > > Register Court: Amtsgericht Aachen HRB 7313 > Ust-Id-No.: DE196450052 > > Managing Director: > Dipl.-Ing. Tom Alexander Heuer > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20230816/c5bc798e/attachment.html>