Thorsten Walk
2021-Nov-05 17:28 UTC
[Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work
Hi Guys, I pushed some VMs to the GlusterFS storage this week and ran them there. For a maintenance task, I moved these VMs to Proxmox-Node-2 and took Node-1 offline for a short time. After moving them back to Node-1 there were some file corpses left (see attachment). In the logs I can't find anything about the gfids :) ?[15:36:51] [ssh:root at pve02(192.168.1.51): /home/darkiop (755)] ??># gvi Cluster: Status: Healthy GlusterFS: 9.3 Nodes: 3/3 Volumes: 1/1 Volumes: glusterfs-1-volume Replicate Started (UP) - 3/3 Bricks Up - (Arbiter Volume) Capacity: (17.89% used) 83.00 GiB/466.00 GiB (used/total) Self-Heal: 192.168.1.51:/data/glusterfs (4 File(s) to heal). Bricks: Distribute Group 1: 192.168.1.50:/data/glusterfs (Online) 192.168.1.51:/data/glusterfs (Online) 192.168.1.40:/data/glusterfs (Online) Brick 192.168.1.50:/data/glusterfs Status: Connected Number of entries: 0 Brick 192.168.1.51:/data/glusterfs <gfid:ade6f31c-b80b-457e-a054-6ca1548d9cd3> <gfid:39365c96-296b-4270-9cdb-1b751e40ad86> <gfid:54774d44-26a7-4954-a657-6e4fa79f2b97> <gfid:d5a8ae04-7301-4876-8d32-37fcd6093977> Status: Connected Number of entries: 4 Brick 192.168.1.40:/data/glusterfs Status: Connected Number of entries: 0 ?[15:37:03] [ssh:root at pve02(192.168.1.51): /home/darkiop (755)] ??># cat /data/glusterfs/.glusterfs/ad/e6/ade6f31c-b80b-457e-a054-6ca1548d9cd3 22962 ?[15:37:13] [ssh:root at pve02(192.168.1.51): /home/darkiop (755)] ??># grep -ir 'ade6f31c-b80b-457e-a054-6ca1548d9cd3' /var/log/glusterfs/*.log Am Mo., 1. Nov. 2021 um 07:51 Uhr schrieb Thorsten Walk <darkiop at gmail.com>:> After deleting the file, output of heal info is clear. > > >Not sure why you ended up in this situation (maybe unlink partially > failed on this brick?) > > Neither did I, this was a completely fresh setup with 1-2 VMs and 1-2 > Proxmox LXC templates. I let it run for a few days and at some point it had > the mentioned state. I continue to monitor and start with fill the bricks > with data. > Thanks for your help! > > Am Mo., 1. Nov. 2021 um 02:54 Uhr schrieb Ravishankar N < > ravishankar.n at pavilion.io>: > >> >> >> On Mon, Nov 1, 2021 at 12:02 AM Thorsten Walk <darkiop at gmail.com> wrote: >> >>> Hi Ravi, the file only exists at pve01 and since only once: >>> >>> ?[19:22:10] [ssh:root at pve01(192.168.1.50): ~ (700)] >>> ??># stat >>> /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768 >>> File: >>> /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768 >>> Size: 6 Blocks: 8 IO Block: 4096 regular file >>> Device: fd12h/64786d Inode: 528 Links: 1 >>> Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) >>> Access: 2021-10-30 14:34:50.385893588 +0200 >>> Modify: 2021-10-27 00:26:43.988756557 +0200 >>> Change: 2021-10-27 00:26:43.988756557 +0200 >>> Birth: - >>> >>> ?[19:24:41] [ssh:root at pve01(192.168.1.50): ~ (700)] >>> ??># ls -l >>> /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768 >>> .rw-r--r-- root root 6B 4 days ago ? >>> /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768 >>> >>> ?[19:24:54] [ssh:root at pve01(192.168.1.50): ~ (700)] >>> ??># cat >>> /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768 >>> 28084 >>> >>> Hi Thorsten, you can delete the file. From the file size and contents, >> it looks like it belongs to ovirt sanlock. Not sure why you ended up in >> this situation (maybe unlink partially failed on this brick?). You can >> check the mount, brick and self-heal daemon logs for this gfid to see if >> you find related error/warning messages. >> >> -Ravi >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20211105/3b2fda5a/attachment.html>
Strahil Nikolov
2021-Nov-05 19:45 UTC
[Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work
You can mount the volume via # mount -t glusterfs -o aux-gfid-mount vm1:test /mnt/testvol And then obtain the path: getfattr -n trusted.glusterfs.pathinfo -e text /mnt/testvol/.gfid/<GFID> Source: https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/ Best Regards,Strahil Nikolov On Fri, Nov 5, 2021 at 19:29, Thorsten Walk<darkiop at gmail.com> wrote: Hi Guys, I pushed some VMs to the GlusterFS storage this week and ran them there. For a maintenance task, I moved these VMs to Proxmox-Node-2 and took Node-1 offline for a short time.After moving them back to Node-1 there were some file corpses left (see attachment). In the logs I can't find anything about the gfids :) ?[15:36:51] [ssh:root at pve02(192.168.1.51): /home/darkiop (755)] ??># gvi Cluster: ? ? ? ? ?Status: Healthy ? ? ? ? ? ? ? ? GlusterFS: 9.3 ? ? ? ? ?Nodes: 3/3 ? ? ? ? ? ? ? ? ? ? ?Volumes: 1/1 Volumes: glusterfs-1-volume ? ? ? ? ? ? ? ? Replicate ? ? ? ? ?Started (UP) - 3/3 Bricks Up ?- (Arbiter Volume) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Capacity: (17.89% used) 83.00 GiB/466.00 GiB (used/total) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Self-Heal: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 192.168.1.51:/data/glusterfs (4 File(s) to heal). ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Bricks: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Distribute Group 1: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?192.168.1.50:/data/glusterfs ? (Online) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?192.168.1.51:/data/glusterfs ? (Online) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?192.168.1.40:/data/glusterfs ? (Online) Brick 192.168.1.50:/data/glusterfs Status: Connected Number of entries: 0 Brick 192.168.1.51:/data/glusterfs <gfid:ade6f31c-b80b-457e-a054-6ca1548d9cd3> <gfid:39365c96-296b-4270-9cdb-1b751e40ad86> <gfid:54774d44-26a7-4954-a657-6e4fa79f2b97> <gfid:d5a8ae04-7301-4876-8d32-37fcd6093977> Status: Connected Number of entries: 4 Brick 192.168.1.40:/data/glusterfs Status: Connected Number of entries: 0 ?[15:37:03] [ssh:root at pve02(192.168.1.51): /home/darkiop (755)] ??># cat /data/glusterfs/.glusterfs/ad/e6/ade6f31c-b80b-457e-a054-6ca1548d9cd3 22962 ?[15:37:13] [ssh:root at pve02(192.168.1.51): /home/darkiop (755)] ??># grep -ir 'ade6f31c-b80b-457e-a054-6ca1548d9cd3' /var/log/glusterfs/*.log Am Mo., 1. Nov. 2021 um 07:51?Uhr schrieb Thorsten Walk <darkiop at gmail.com>: After deleting the file, output of heal info is clear.>Not sure why you ended up in this situation (maybe unlink partially failed on this brick?)Neither did I, this was a completely fresh setup with 1-2 VMs and 1-2 Proxmox LXC templates. I let it run for a few days and at some point it had the mentioned state. I continue to monitor and start with fill the bricks with data. Thanks for your help! Am Mo., 1. Nov. 2021 um 02:54?Uhr schrieb Ravishankar N <ravishankar.n at pavilion.io>: On Mon, Nov 1, 2021 at 12:02 AM Thorsten Walk <darkiop at gmail.com> wrote: Hi Ravi, the file only exists at pve01?and since only once: ?[19:22:10] [ssh:root at pve01(192.168.1.50): ~ (700)] ??># stat /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768 ? File: /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768 ? Size: 6 ? ? ? ? ? ? ? Blocks: 8 ? ? ? ? ?IO Block: 4096 ? regular file Device: fd12h/64786d ? ?Inode: 528 ? ? ? ? Links: 1 Access: (0644/-rw-r--r--) ?Uid: ( ? ?0/ ? ?root) ? Gid: ( ? ?0/ ? ?root) Access: 2021-10-30 14:34:50.385893588 +0200 Modify: 2021-10-27 00:26:43.988756557 +0200 Change: 2021-10-27 00:26:43.988756557 +0200 ?Birth: - ?[19:24:41] [ssh:root at pve01(192.168.1.50): ~ (700)] ??># ls -l /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768 .rw-r--r-- root root 6B 4 days ago ? /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768 ?[19:24:54] [ssh:root at pve01(192.168.1.50): ~ (700)] ??># cat /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768 28084 Hi Thorsten, you can delete the file. From the file size and contents, it looks like it belongs to ovirt sanlock. Not sure why you ended up in this situation (maybe unlink partially failed on this brick?). You can check the mount, brick and self-heal daemon logs for this gfid to? see if you find related error/warning messages. -Ravi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20211105/65828a39/attachment.html>