Ravishankar N
2021-Oct-31 06:35 UTC
[Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work
On Sat, Oct 30, 2021 at 10:47 PM Strahil Nikolov <hunter86_bg at yahoo.com> wrote:> Hi, > > based on the output it seems that for some reason the file was deployed > locally but not on the 2-nd brick and the arbiter , which for a 'replica 3 > arbiter 1' (a.k.a replica 2 arbiter 1) is strange. > > It seems that cluster.eager-lock is enabled as per the virt group: > https://github.com/gluster/glusterfs/blob/devel/extras/group-virt.example > > @Ravi, > > do you think that it should not be enabled by default in the virt group ? >It should be enabled alright, but we have noticed some issues of stale locks (https://github.com/gluster/glusterfs/issues/ {2198, 2211, 2027}) which could prevent self-heal (or any other I/O that takes a blocking lock) from happening. But the problem here is different as you noticed. Thorsten needs to find the actual file (`find -samefile`) corresponding to this gfid and see what is the file size, hard-link count etc.) If it is a zero -byte file, then it should be safe to just delete the file and its hardlink from the brick. Regards, Ravi> Best Regards, > Strahil Nikolov > > > > On Sat, Oct 30, 2021 at 16:14, Thorsten Walk > <darkiop at gmail.com> wrote: > Hi Ravi & Strahil, thanks a lot for your answer! > > The file in the path .glusterfs/26/c5/.. only exists at node1 (=pve01). On > node2 (pve02) and the arbiter (freya), the file does not exist: > > > > ?[14:35:48] [ssh:root at pve01(192.168.1.50): ~ (700)] > ??># getfattr -d -m. -e hex > /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768 > getfattr: Removing leading '/' from absolute path names > # file: > data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768 > trusted.afr.dirty=0x000000000000000000000000 > trusted.afr.glusterfs-1-volume-client-1=0x000000010000000100000000 > trusted.afr.glusterfs-1-volume-client-2=0x000000010000000100000000 > trusted.gfid=0x26c5396c86ff408d9cda106acd2b0768 > > trusted.glusterfs.mdata=0x01000000000000000000000000617880a3000000003b2f011700000000617880a3000000003b2f011700000000617880a3000000003983a635 > > ?[14:36:49] [ssh:root at pve02(192.168.1.51): > /data/glusterfs/.glusterfs/26/c5 (700)] > ??># ll > drwx------ root root 6B 3 days ago ? ./ > drwx------ root root 8.0K 6 hours ago ? ../ > > ?[14:36:58] [ssh:root at freya(192.168.1.40): > /data/glusterfs/.glusterfs/26/c5 (700)] > ??># ll > drwx------ root root 6B 3 days ago ? ./ > drwx------ root root 8.0K 3 hours ago ? ../ > > > > After this, i have disabled the the option you mentioned: > > gluster volume set glusterfs-1-volume cluster.eager-lock off > > After that I started another healing process manually. Unfortunately > without success. > > @Strahil: For your idea with > https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/ i need > more time, maybe i can try it tomorrow. I'll be in touch. > > Thanks again and best regards, > Thorsten > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20211031/4f007dd9/attachment.html>
Thorsten Walk
2021-Oct-31 08:06 UTC
[Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work
Hello,>It should be enabled alright, but we have noticed some issues of stalelocks (https://github.com/gluster/glusterfs/issues/ {2198, 2211, 2027}) which could prevent self-heal (or any other I/O that takes a blocking lock) from happening. I have re-enabled cluster.eager-lock.>But the problem here is different as you noticed. Thorsten needs to findthe actual file (`find -samefile`) corresponding to this gfid and see what is the file size, hard-link count etc.) If it is a zero -byte file, then it should be safe to just delete the file and its hardlink from the brick. I think, here i need your help :) How i can find the file? I only have the gfid from the out of 'gluster volume heal glusterfs-1-volume info' = <gfid:26c5396c-86ff-408d-9cda-106acd2b0768> on Brick 192.168.1.50: /data/glusterfs. Thanks and regards, Thorsten Am So., 31. Okt. 2021 um 07:35 Uhr schrieb Ravishankar N < ravishankar.n at pavilion.io>:> > > On Sat, Oct 30, 2021 at 10:47 PM Strahil Nikolov <hunter86_bg at yahoo.com> > wrote: > >> Hi, >> >> based on the output it seems that for some reason the file was deployed >> locally but not on the 2-nd brick and the arbiter , which for a 'replica 3 >> arbiter 1' (a.k.a replica 2 arbiter 1) is strange. >> >> It seems that cluster.eager-lock is enabled as per the virt group: >> https://github.com/gluster/glusterfs/blob/devel/extras/group-virt.example >> >> @Ravi, >> >> do you think that it should not be enabled by default in the virt group ? >> > > It should be enabled alright, but we have noticed some issues of stale > locks (https://github.com/gluster/glusterfs/issues/ {2198, 2211, 2027}) > which could prevent self-heal (or any other I/O that takes a blocking lock) > from happening. But the problem here is different as you noticed. Thorsten > needs to find the actual file (`find -samefile`) corresponding to this gfid > and see what is the file size, hard-link count etc.) If it is a zero -byte > file, then it should be safe to just delete the file and its hardlink from > the brick. > > Regards, > Ravi > > >> Best Regards, >> Strahil Nikolov >> >> >> >> On Sat, Oct 30, 2021 at 16:14, Thorsten Walk >> <darkiop at gmail.com> wrote: >> Hi Ravi & Strahil, thanks a lot for your answer! >> >> The file in the path .glusterfs/26/c5/.. only exists at node1 (=pve01). >> On node2 (pve02) and the arbiter (freya), the file does not exist: >> >> >> >> ?[14:35:48] [ssh:root at pve01(192.168.1.50): ~ (700)] >> ??># getfattr -d -m. -e hex >> /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768 >> getfattr: Removing leading '/' from absolute path names >> # file: >> data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768 >> trusted.afr.dirty=0x000000000000000000000000 >> trusted.afr.glusterfs-1-volume-client-1=0x000000010000000100000000 >> trusted.afr.glusterfs-1-volume-client-2=0x000000010000000100000000 >> trusted.gfid=0x26c5396c86ff408d9cda106acd2b0768 >> >> trusted.glusterfs.mdata=0x01000000000000000000000000617880a3000000003b2f011700000000617880a3000000003b2f011700000000617880a3000000003983a635 >> >> ?[14:36:49] [ssh:root at pve02(192.168.1.51): >> /data/glusterfs/.glusterfs/26/c5 (700)] >> ??># ll >> drwx------ root root 6B 3 days ago ? ./ >> drwx------ root root 8.0K 6 hours ago ? ../ >> >> ?[14:36:58] [ssh:root at freya(192.168.1.40): >> /data/glusterfs/.glusterfs/26/c5 (700)] >> ??># ll >> drwx------ root root 6B 3 days ago ? ./ >> drwx------ root root 8.0K 3 hours ago ? ../ >> >> >> >> After this, i have disabled the the option you mentioned: >> >> gluster volume set glusterfs-1-volume cluster.eager-lock off >> >> After that I started another healing process manually. Unfortunately >> without success. >> >> @Strahil: For your idea with >> https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/ i need >> more time, maybe i can try it tomorrow. I'll be in touch. >> >> Thanks again and best regards, >> Thorsten >> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20211031/72ce8628/attachment.html>