Strahil Nikolov
2021-Oct-30 17:17 UTC
[Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work
Hi, based on the output it seems that for some reason the file was deployed locally but not on the 2-nd brick and the arbiter , which for a 'replica 3 arbiter 1' (a.k.a replica 2 arbiter 1) is strange. It seems that cluster.eager-lock is enabled as per the virt group: https://github.com/gluster/glusterfs/blob/devel/extras/group-virt.example @Ravi, do you think that it should not be enabled by default in the virt group ? Best Regards,Strahil Nikolov On Sat, Oct 30, 2021 at 16:14, Thorsten Walk<darkiop at gmail.com> wrote: Hi Ravi &?Strahil, thanks a lot for your answer! The file in the path .glusterfs/26/c5/.. only exists at node1 (=pve01). On node2 (pve02) and the arbiter (freya), the file does?not exist: ?[14:35:48] [ssh:root at pve01(192.168.1.50): ~ (700)] ??># getfattr -d -m. -e hex ?/data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768 getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.glusterfs-1-volume-client-1=0x000000010000000100000000 trusted.afr.glusterfs-1-volume-client-2=0x000000010000000100000000 trusted.gfid=0x26c5396c86ff408d9cda106acd2b0768 trusted.glusterfs.mdata=0x01000000000000000000000000617880a3000000003b2f011700000000617880a3000000003b2f011700000000617880a3000000003983a635 ?[14:36:49] [ssh:root at pve02(192.168.1.51): /data/glusterfs/.glusterfs/26/c5 (700)] ??># ll drwx------ root root ? 6B 3 days ago ?? ./ drwx------ root root 8.0K 6 hours ago ? ../ ?[14:36:58] [ssh:root at freya(192.168.1.40): /data/glusterfs/.glusterfs/26/c5 (700)] ??># ll drwx------ root root ? 6B 3 days ago ?? ./ drwx------ root root 8.0K 3 hours ago ? ../ After this, i have disabled the?the option you mentioned: gluster volume set glusterfs-1-volume cluster.eager-lock off After that I started another healing process manually. Unfortunately without success. @Strahil: For your idea with?https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/ i need more time, maybe i can try it tomorrow. I'll be in touch. Thanks again and best regards,Thorsten -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20211030/5414d1a4/attachment.html>
Ravishankar N
2021-Oct-31 06:35 UTC
[Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work
On Sat, Oct 30, 2021 at 10:47 PM Strahil Nikolov <hunter86_bg at yahoo.com> wrote:> Hi, > > based on the output it seems that for some reason the file was deployed > locally but not on the 2-nd brick and the arbiter , which for a 'replica 3 > arbiter 1' (a.k.a replica 2 arbiter 1) is strange. > > It seems that cluster.eager-lock is enabled as per the virt group: > https://github.com/gluster/glusterfs/blob/devel/extras/group-virt.example > > @Ravi, > > do you think that it should not be enabled by default in the virt group ? >It should be enabled alright, but we have noticed some issues of stale locks (https://github.com/gluster/glusterfs/issues/ {2198, 2211, 2027}) which could prevent self-heal (or any other I/O that takes a blocking lock) from happening. But the problem here is different as you noticed. Thorsten needs to find the actual file (`find -samefile`) corresponding to this gfid and see what is the file size, hard-link count etc.) If it is a zero -byte file, then it should be safe to just delete the file and its hardlink from the brick. Regards, Ravi> Best Regards, > Strahil Nikolov > > > > On Sat, Oct 30, 2021 at 16:14, Thorsten Walk > <darkiop at gmail.com> wrote: > Hi Ravi & Strahil, thanks a lot for your answer! > > The file in the path .glusterfs/26/c5/.. only exists at node1 (=pve01). On > node2 (pve02) and the arbiter (freya), the file does not exist: > > > > ?[14:35:48] [ssh:root at pve01(192.168.1.50): ~ (700)] > ??># getfattr -d -m. -e hex > /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768 > getfattr: Removing leading '/' from absolute path names > # file: > data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768 > trusted.afr.dirty=0x000000000000000000000000 > trusted.afr.glusterfs-1-volume-client-1=0x000000010000000100000000 > trusted.afr.glusterfs-1-volume-client-2=0x000000010000000100000000 > trusted.gfid=0x26c5396c86ff408d9cda106acd2b0768 > > trusted.glusterfs.mdata=0x01000000000000000000000000617880a3000000003b2f011700000000617880a3000000003b2f011700000000617880a3000000003983a635 > > ?[14:36:49] [ssh:root at pve02(192.168.1.51): > /data/glusterfs/.glusterfs/26/c5 (700)] > ??># ll > drwx------ root root 6B 3 days ago ? ./ > drwx------ root root 8.0K 6 hours ago ? ../ > > ?[14:36:58] [ssh:root at freya(192.168.1.40): > /data/glusterfs/.glusterfs/26/c5 (700)] > ??># ll > drwx------ root root 6B 3 days ago ? ./ > drwx------ root root 8.0K 3 hours ago ? ../ > > > > After this, i have disabled the the option you mentioned: > > gluster volume set glusterfs-1-volume cluster.eager-lock off > > After that I started another healing process manually. Unfortunately > without success. > > @Strahil: For your idea with > https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/ i need > more time, maybe i can try it tomorrow. I'll be in touch. > > Thanks again and best regards, > Thorsten > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20211031/4f007dd9/attachment.html>