thr3ads.net - Gluster users - [Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work [Oct 2021]

If this information is useful, please help other people find it:
Share via:

Strahil Nikolov

2021-Oct-30 17:17 UTC

[Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work

Hi,
based on the output it seems that for some reason the file was deployed locally
but not on the 2-nd brick and the arbiter , which for a 'replica 3 arbiter
1' (a.k.a replica 2 arbiter 1) is strange.
It seems that cluster.eager-lock is enabled as per the virt group:
https://github.com/gluster/glusterfs/blob/devel/extras/group-virt.example
@Ravi,
do you think that it should not be enabled by default in the virt group ?
Best Regards,Strahil Nikolov

 
 
  On Sat, Oct 30, 2021 at 16:14, Thorsten Walk<darkiop at gmail.com>
wrote:   Hi Ravi &?Strahil, thanks a lot for your answer!
The file in the path .glusterfs/26/c5/.. only exists at node1 (=pve01). On node2
(pve02) and the arbiter (freya), the file does?not exist:


?[14:35:48] [ssh:root at pve01(192.168.1.50): ~ (700)]
??># getfattr -d -m. -e hex
?/data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
getfattr: Removing leading '/' from absolute path names
# file: data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.glusterfs-1-volume-client-1=0x000000010000000100000000
trusted.afr.glusterfs-1-volume-client-2=0x000000010000000100000000
trusted.gfid=0x26c5396c86ff408d9cda106acd2b0768
trusted.glusterfs.mdata=0x01000000000000000000000000617880a3000000003b2f011700000000617880a3000000003b2f011700000000617880a3000000003983a635

?[14:36:49] [ssh:root at pve02(192.168.1.51): /data/glusterfs/.glusterfs/26/c5
(700)]
??># ll
drwx------ root root ? 6B 3 days ago ?? ./
drwx------ root root 8.0K 6 hours ago ? ../

?[14:36:58] [ssh:root at freya(192.168.1.40): /data/glusterfs/.glusterfs/26/c5
(700)]
??># ll
drwx------ root root ? 6B 3 days ago ?? ./
drwx------ root root 8.0K 3 hours ago ? ../



After this, i have disabled the?the option you mentioned:
gluster volume set glusterfs-1-volume cluster.eager-lock off

After that I started another healing process manually. Unfortunately without
success.

@Strahil: For your idea
with?https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/ i need
more time, maybe i can try it tomorrow. I'll be in touch.

Thanks again and best regards,Thorsten
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20211030/5414d1a4/attachment.html>

Ravishankar N

2021-Oct-31 06:35 UTC

head link

[Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work

On Sat, Oct 30, 2021 at 10:47 PM Strahil Nikolov <hunter86_bg at
yahoo.com>
wrote:
> Hi,
>
> based on the output it seems that for some reason the file was deployed
> locally but not on the 2-nd brick and the arbiter , which for a
'replica 3
> arbiter 1' (a.k.a replica 2 arbiter 1) is strange.
>
> It seems that cluster.eager-lock is enabled as per the virt group:
> https://github.com/gluster/glusterfs/blob/devel/extras/group-virt.example
>
> @Ravi,
>
> do you think that it should not be enabled by default in the virt group ?
>
It should be enabled alright, but we have noticed some issues of stale
locks (https://github.com/gluster/glusterfs/issues/ {2198, 2211, 2027})
which could prevent self-heal (or any other I/O that takes a blocking lock)
from happening. But the problem here is different as you noticed. Thorsten
needs to find the actual file (`find -samefile`) corresponding to this gfid
and see what is the file size, hard-link count etc.) If it is a zero -byte
file, then it should be safe to just delete the file and its hardlink from
the brick.

Regards,
Ravi

> Best Regards,
> Strahil Nikolov
>
>
>
> On Sat, Oct 30, 2021 at 16:14, Thorsten Walk
> <darkiop at gmail.com> wrote:
> Hi Ravi & Strahil, thanks a lot for your answer!
>
> The file in the path .glusterfs/26/c5/.. only exists at node1 (=pve01). On
> node2 (pve02) and the arbiter (freya), the file does not exist:
>
>
>
> ?[14:35:48] [ssh:root at pve01(192.168.1.50): ~ (700)]
> ??># getfattr -d -m. -e hex
>  /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
> getfattr: Removing leading '/' from absolute path names
> # file:
> data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.glusterfs-1-volume-client-1=0x000000010000000100000000
> trusted.afr.glusterfs-1-volume-client-2=0x000000010000000100000000
> trusted.gfid=0x26c5396c86ff408d9cda106acd2b0768
>
>
trusted.glusterfs.mdata=0x01000000000000000000000000617880a3000000003b2f011700000000617880a3000000003b2f011700000000617880a3000000003983a635
>
> ?[14:36:49] [ssh:root at pve02(192.168.1.51):
> /data/glusterfs/.glusterfs/26/c5 (700)]
> ??># ll
> drwx------ root root   6B 3 days ago  ? ./
> drwx------ root root 8.0K 6 hours ago ? ../
>
> ?[14:36:58] [ssh:root at freya(192.168.1.40):
> /data/glusterfs/.glusterfs/26/c5 (700)]
> ??># ll
> drwx------ root root   6B 3 days ago  ? ./
> drwx------ root root 8.0K 3 hours ago ? ../
>
>
>
> After this, i have disabled the the option you mentioned:
>
> gluster volume set glusterfs-1-volume cluster.eager-lock off
>
> After that I started another healing process manually. Unfortunately
> without success.
>
> @Strahil: For your idea with
> https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/ i need
> more time, maybe i can try it tomorrow. I'll be in touch.
>
> Thanks again and best regards,
> Thorsten
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20211031/4f007dd9/attachment.html>

Gluster users - Oct 2021 - GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work

[Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work

[Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work