thr3ads.net - Gluster users - [Gluster-users] orphaned snapshots [Aug 2023]

If this information is useful, please help other people find it:
Share via:

Strahil Nikolov

2023-Aug-10 18:33 UTC

[Gluster-users] orphaned snapshots

I?ve never had such situation and I don?t recall someone sharing something
similar.
Most probably it?s easier to remove the node from the TSP and re-add it.Of
course , test the case in VMs just to validate that it?s possible to add a mode
to a cluster with snapshots.
I have a vague feeling that you will need to delete all snapshots.
Best Regards,Strahil Nikolov?

On Thursday, August 10, 2023, 4:36 AM, Sebastian Neustein <sebastian.neustein
at arc-aachen.de> wrote:

       Hi
 
 Due to an outage of one node, after bringing it up again, the node has some
orphaned snapshosts, which are already deleted on the other nodes.
 
 How can I delete these orphaned snapshots? Trying the normal way produceses
these errors:
 [2023-08-08 19:34:03.667109 +0000] E [MSGID: 106115]
[glusterd-mgmt.c:118:gd_mgmt_v3_collate_errors] 0-management: Pre Validation
failed on B742. Please check log file for details.
 [2023-08-08 19:34:03.667184 +0000] E [MSGID: 106115]
[glusterd-mgmt.c:118:gd_mgmt_v3_collate_errors] 0-management: Pre Validation
failed on B741. Please check log file for details.
 [2023-08-08 19:34:03.667210 +0000] E [MSGID: 106121]
[glusterd-mgmt.c:1083:glusterd_mgmt_v3_pre_validate] 0-management: Pre
Validation failed on peers
 [2023-08-08 19:34:03.667236 +0000] E [MSGID: 106121]
[glusterd-mgmt.c:2875:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Pre
Validation Failed
 
 Even worse: I followed read hat gluster snapshot trouble guide and deleted one
of those directories defining a snapshot. Now I receive this on the cli:
 run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]:
[2023-08-09 08:59:41.107243 +0000] M [MSGID: 113075]
[posix-helpers.c:2161:posix_health_check_thread_proc]
0-e4dcd4166538414c849fa91b0b3934d7-posix: health-check failed, going down
 run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]:
[2023-08-09 08:59:41.107243 +0000] M [MSGID: 113075]
[posix-helpers.c:2161:posix_health_check_thread_proc]
0-e4dcd4166538414c849fa91b0b3934d7-posix: health-check failed, going down
 run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]:
[2023-08-09 08:59:41.107292 +0000] M [MSGID: 113075]
[posix-helpers.c:2179:posix_health_check_thread_proc]
0-e4dcd4166538414c849fa91b0b3934d7-posix: still alive! -> SIGTERM
 run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]:
[2023-08-09 08:59:41.107292 +0000] M [MSGID: 113075]
[posix-helpers.c:2179:posix_health_check_thread_proc]
0-e4dcd4166538414c849fa91b0b3934d7-posix: still alive! -> SIGTERM
 
 What are my options? 
 - is there an easy way to remove all those snapshots?
 - or would it be easier to remove and rejoin the node to the gluster cluster?
 
 Thank you for any help!
 
 Seb
       -- 
Sebastian Neustein

Airport Research Center GmbH
Bismarckstra?e 61
52066 Aachen
Germany

Phone: +49 241 16843-23
Fax: +49 241 16843-19
e-mail: sebastian.neustein at arc-aachen.de
Website: http://www.airport-consultants.com

Register Court: Amtsgericht Aachen HRB 7313
Ust-Id-No.: DE196450052

Managing Director:
Dipl.-Ing. Tom Alexander Heuer ________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users at gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20230810/1d6da8c4/attachment.html>

Sebastian Neustein

2023-Aug-11 11:14 UTC

head link

[Gluster-users] orphaned snapshots

Thank you for your reply!

I will set up an test environment and see what will happen.

Am 10.08.2023 um 20:33 schrieb Strahil Nikolov:> I?ve never had such situation and I don?t recall someone sharing 
> something similar.
>
> Most probably it?s easier to remove the node from the TSP and re-add it.
> Of course , test the case in VMs just to validate that it?s possible 
> to add a mode to a cluster with snapshots.
>
> I have a vague feeling that you will need to delete all snapshots.
>
> Best Regards,
> Strahil Nikolov
>
> On Thursday, August 10, 2023, 4:36 AM, Sebastian Neustein 
> <sebastian.neustein at arc-aachen.de> wrote:
>
>     Hi
>
>     Due to an outage of one node, after bringing it up again, the node
>     has some orphaned snapshosts, which are already deleted on the
>     other nodes.
>
>     How can I delete these orphaned snapshots? Trying the normal way
>     produceses these errors:
>     |[2023-08-08 19:34:03.667109 +0000] E [MSGID: 106115]
>     [glusterd-mgmt.c:118:gd_mgmt_v3_collate_errors] 0-management: Pre
>     Validation failed on B742. Please check log file for details.|
>     |[2023-08-08 19:34:03.667184 +0000] E [MSGID: 106115]
>     [glusterd-mgmt.c:118:gd_mgmt_v3_collate_errors] 0-management: Pre
>     Validation failed on B741. Please check log file for details.|
>     |[2023-08-08 19:34:03.667210 +0000] E [MSGID: 106121]
>     [glusterd-mgmt.c:1083:glusterd_mgmt_v3_pre_validate] 0-management:
>     Pre Validation failed on peers|
>     |[2023-08-08 19:34:03.667236 +0000] E [MSGID: 106121]
>     [glusterd-mgmt.c:2875:glusterd_mgmt_v3_initiate_snap_phases]
>     0-management: Pre Validation Failed|
>
>     Even worse: I followed read hat gluster snapshot trouble guide
>    
<https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/administration_guide/troubleshooting_snapshots>
>     and deleted one of those directories defining a snapshot. Now I
>     receive this on the cli:
>    
|run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]:
>     [2023-08-09 08:59:41.107243 +0000] M [MSGID: 113075]
>     [posix-helpers.c:2161:posix_health_check_thread_proc]
>     0-e4dcd4166538414c849fa91b0b3934d7-posix: health-check failed,
>     going down|
>    
|run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]:
>     [2023-08-09 08:59:41.107243 +0000] M [MSGID: 113075]
>     [posix-helpers.c:2161:posix_health_check_thread_proc]
>     0-e4dcd4166538414c849fa91b0b3934d7-posix: health-check failed,
>     going down|
>    
|run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]:
>     [2023-08-09 08:59:41.107292 +0000] M [MSGID: 113075]
>     [posix-helpers.c:2179:posix_health_check_thread_proc]
>     0-e4dcd4166538414c849fa91b0b3934d7-posix: still alive! -> SIGTERM|
>    
|run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]:
>     [2023-08-09 08:59:41.107292 +0000] M [MSGID: 113075]
>     [posix-helpers.c:2179:posix_health_check_thread_proc]
>     0-e4dcd4166538414c849fa91b0b3934d7-posix: still alive! -> SIGTERM|
>
>     What are my options?
>     - is there an easy way to remove all those snapshots?
>     - or would it be easier to remove and rejoin the node to the
>     gluster cluster?
>
>     Thank you for any help!
>
>     Seb
>
>     -- 
>     Sebastian Neustein
>
>     Airport Research Center GmbH
>     Bismarckstra?e 61
>     52066 Aachen
>     Germany
>
>     Phone: +49 241 16843-23
>     Fax: +49 241 16843-19
>     e-mail:sebastian.neustein at arc-aachen.de 
<mailto:sebastian.neustein at arc-aachen.de>
>     Website:http://www.airport-consultants.com 
<http://www.airport-consultants.com>
>
>     Register Court: Amtsgericht Aachen HRB 7313
>     Ust-Id-No.: DE196450052
>
>     Managing Director:
>     Dipl.-Ing. Tom Alexander Heuer
>
>     ________
>
>
>
>     Community Meeting Calendar:
>
>     Schedule -
>     Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>     Bridge: https://meet.google.com/cpu-eiue-hvk
>     Gluster-users mailing list
>     Gluster-users at gluster.org
>     https://lists.gluster.org/mailman/listinfo/gluster-users
>
-- 
Sebastian Neustein

Airport Research Center GmbH
Bismarckstra?e 61
52066 Aachen
Germany

Phone: +49 241 16843-23
Fax: +49 241 16843-19
e-mail:sebastian.neustein at arc-aachen.de
Website:http://www.airport-consultants.com

Register Court: Amtsgericht Aachen HRB 7313
Ust-Id-No.: DE196450052

Managing Director:
Dipl.-Ing. Tom Alexander Heuer
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20230811/137dfcd0/attachment.html>

Sebastian Neustein

2023-Aug-16 10:20 UTC

head link

[Gluster-users] orphaned snapshots

Strahil Nikolov:> I?ve never had such situation and I don?t recall someone sharing 
> something similar.

That's strange, it is really easy to reproduce. This is from a fresh 
test environment:

summary:
- There is one snapshot present.
- On one node glusterd is stopped.
- During the stop, one snapshot is deleted.
- The node is brought up again
- On that node there is an orphaned snapshot


detailed version:
# on node 1:
root at gl1:~# cat /etc/debian_version
11.7

root at gl1:~# gluster --version
glusterfs 10.4

root at gl1:~# gluster volume info
Volume Name: glvol_samba
Type: Replicate
Volume ID: 91cb059e-10e4-4439-92ea-001065652749
Status: Started
Snapshot Count: 1
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: gl1:/data/glusterfs/glvol_samba/brick0/brick
Brick2: gl2:/data/glusterfs/glvol_samba/brick0/brick
Brick3: gl3:/data/glusterfs/glvol_samba/brick0/brick
Options Reconfigured:
cluster.granular-entry-heal: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
features.barrier: disable

root at gl1:~# gluster snapshot list
snaps_GMT-2023.08.15-13.05.28



# on node 3:
root at gl3:~# systemctl stop glusterd.service



# on node 1:
root at gl1:~# gluster snapshot deactivate snaps_GMT-2023.08.15-13.05.28
Deactivating snap will make its data inaccessible. Do you want to 
continue? (y/n) y
Snapshot deactivate: snaps_GMT-2023.08.15-13.05.28: Snap deactivated 
successfully

root at gl1:~# gluster snapshot delete snaps_GMT-2023.08.15-13.05.28
Deleting snap will erase all the information about the snap. Do you 
still want to continue? (y/n) y
snapshot delete: snaps_GMT-2023.08.15-13.05.28: snap removed successfully

root at gl1:~# gluster snapshot list
No snapshots present



# on node 3:
root at gl3:~# systemctl start glusterd.service

root at gl3:~# gluster snapshot list
snaps_GMT-2023.08.15-13.05.28

root at gl3:~# gluster snapshot deactivate snaps_GMT-2023.08.15-13.05.28
Deactivating snap will make its data inaccessible. Do you want to 
continue? (y/n) y
snapshot deactivate: failed: Pre Validation failed on gl1.ad.arc.de. 
Snapshot (snaps_GMT-2023.08.15-13.05.28) does not exist.
Pre Validation failed on gl2. Snapshot (snaps_GMT-2023.08.15-13.05.28) 
does not exist.
Snapshot command failed

root at gl3:~# lvs -a
 ? LV???????????????????????????????? VG??????? Attr?????? LSize 
Pool????? Origin??? Data%? Meta%? Move Log Cpy%Sync Convert
 ? 669cbc14fa7542acafb2995666284583_0 vg_brick0 Vwi-aotz-- 15,00g 
tp_brick0 lv_brick0 0,08
 ? lv_brick0????????????????????????? vg_brick0 Vwi-aotz-- 15,00g 
tp_brick0?????????? 0,08
 ? [lvol0_pmspare]??????????????????? vg_brick0 ewi------- 20,00m
 ? tp_brick0????????????????????????? vg_brick0 twi-aotz-- 
18,00g???????????????????? 0,12?? 10,57
 ? [tp_brick0_tdata]????????????????? vg_brick0 Twi-ao---- 18,00g
 ? [tp_brick0_tmeta]????????????????? vg_brick0 ewi-ao---- 20,00m




Would it be dangerous to just delete following items on node 3 while 
gluster is down:
- the orphaned directories in /var/lib/glusterd/snaps/
- the orphaned lvm, here 669cbc14fa7542acafb2995666284583_0

Or is there a self-heal command?

Regards
Sebastian

Am 10.08.2023 um 20:33 schrieb Strahil Nikolov:> I?ve never had such situation and I don?t recall someone sharing 
> something similar.
>
> Most probably it?s easier to remove the node from the TSP and re-add it.
> Of course , test the case in VMs just to validate that it?s possible 
> to add a mode to a cluster with snapshots.
>
> I have a vague feeling that you will need to delete all snapshots.
>
> Best Regards,
> Strahil Nikolov
>
> On Thursday, August 10, 2023, 4:36 AM, Sebastian Neustein 
> <sebastian.neustein at arc-aachen.de> wrote:
>
>     Hi
>
>     Due to an outage of one node, after bringing it up again, the node
>     has some orphaned snapshosts, which are already deleted on the
>     other nodes.
>
>     How can I delete these orphaned snapshots? Trying the normal way
>     produceses these errors:
>     |[2023-08-08 19:34:03.667109 +0000] E [MSGID: 106115]
>     [glusterd-mgmt.c:118:gd_mgmt_v3_collate_errors] 0-management: Pre
>     Validation failed on B742. Please check log file for details.|
>     |[2023-08-08 19:34:03.667184 +0000] E [MSGID: 106115]
>     [glusterd-mgmt.c:118:gd_mgmt_v3_collate_errors] 0-management: Pre
>     Validation failed on B741. Please check log file for details.|
>     |[2023-08-08 19:34:03.667210 +0000] E [MSGID: 106121]
>     [glusterd-mgmt.c:1083:glusterd_mgmt_v3_pre_validate] 0-management:
>     Pre Validation failed on peers|
>     |[2023-08-08 19:34:03.667236 +0000] E [MSGID: 106121]
>     [glusterd-mgmt.c:2875:glusterd_mgmt_v3_initiate_snap_phases]
>     0-management: Pre Validation Failed|
>
>     Even worse: I followed read hat gluster snapshot trouble guide
>    
<https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.5/html/administration_guide/troubleshooting_snapshots>
>     and deleted one of those directories defining a snapshot. Now I
>     receive this on the cli:
>    
|run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]:
>     [2023-08-09 08:59:41.107243 +0000] M [MSGID: 113075]
>     [posix-helpers.c:2161:posix_health_check_thread_proc]
>     0-e4dcd4166538414c849fa91b0b3934d7-posix: health-check failed,
>     going down|
>    
|run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]:
>     [2023-08-09 08:59:41.107243 +0000] M [MSGID: 113075]
>     [posix-helpers.c:2161:posix_health_check_thread_proc]
>     0-e4dcd4166538414c849fa91b0b3934d7-posix: health-check failed,
>     going down|
>    
|run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]:
>     [2023-08-09 08:59:41.107292 +0000] M [MSGID: 113075]
>     [posix-helpers.c:2179:posix_health_check_thread_proc]
>     0-e4dcd4166538414c849fa91b0b3934d7-posix: still alive! -> SIGTERM|
>    
|run-gluster-snaps-e4dcd4166538414c849fa91b0b3934d7-brick6-brick[297342]:
>     [2023-08-09 08:59:41.107292 +0000] M [MSGID: 113075]
>     [posix-helpers.c:2179:posix_health_check_thread_proc]
>     0-e4dcd4166538414c849fa91b0b3934d7-posix: still alive! -> SIGTERM|
>
>     What are my options?
>     - is there an easy way to remove all those snapshots?
>     - or would it be easier to remove and rejoin the node to the
>     gluster cluster?
>
>     Thank you for any help!
>
>     Seb
>
>     -- 
>     Sebastian Neustein
>
>     Airport Research Center GmbH
>     Bismarckstra?e 61
>     52066 Aachen
>     Germany
>
>     Phone: +49 241 16843-23
>     Fax: +49 241 16843-19
>     e-mail:sebastian.neustein at arc-aachen.de 
<mailto:sebastian.neustein at arc-aachen.de>
>     Website:http://www.airport-consultants.com 
<http://www.airport-consultants.com>
>
>     Register Court: Amtsgericht Aachen HRB 7313
>     Ust-Id-No.: DE196450052
>
>     Managing Director:
>     Dipl.-Ing. Tom Alexander Heuer
>
>     ________
>
>
>
>     Community Meeting Calendar:
>
>     Schedule -
>     Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>     Bridge: https://meet.google.com/cpu-eiue-hvk
>     Gluster-users mailing list
>     Gluster-users at gluster.org
>     https://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20230816/c5bc798e/attachment.html>

Gluster users - Aug 2023 - orphaned snapshots

[Gluster-users] orphaned snapshots

[Gluster-users] orphaned snapshots

[Gluster-users] orphaned snapshots