thr3ads.net - Gluster users - [Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work [Oct 2021]

If this information is useful, please help other people find it:
Share via:

Ravishankar N

2021-Oct-31 06:35 UTC

[Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work

On Sat, Oct 30, 2021 at 10:47 PM Strahil Nikolov <hunter86_bg at
yahoo.com>
wrote:
> Hi,
>
> based on the output it seems that for some reason the file was deployed
> locally but not on the 2-nd brick and the arbiter , which for a
'replica 3
> arbiter 1' (a.k.a replica 2 arbiter 1) is strange.
>
> It seems that cluster.eager-lock is enabled as per the virt group:
> https://github.com/gluster/glusterfs/blob/devel/extras/group-virt.example
>
> @Ravi,
>
> do you think that it should not be enabled by default in the virt group ?
>
It should be enabled alright, but we have noticed some issues of stale
locks (https://github.com/gluster/glusterfs/issues/ {2198, 2211, 2027})
which could prevent self-heal (or any other I/O that takes a blocking lock)
from happening. But the problem here is different as you noticed. Thorsten
needs to find the actual file (`find -samefile`) corresponding to this gfid
and see what is the file size, hard-link count etc.) If it is a zero -byte
file, then it should be safe to just delete the file and its hardlink from
the brick.

Regards,
Ravi

> Best Regards,
> Strahil Nikolov
>
>
>
> On Sat, Oct 30, 2021 at 16:14, Thorsten Walk
> <darkiop at gmail.com> wrote:
> Hi Ravi & Strahil, thanks a lot for your answer!
>
> The file in the path .glusterfs/26/c5/.. only exists at node1 (=pve01). On
> node2 (pve02) and the arbiter (freya), the file does not exist:
>
>
>
> ?[14:35:48] [ssh:root at pve01(192.168.1.50): ~ (700)]
> ??># getfattr -d -m. -e hex
>  /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
> getfattr: Removing leading '/' from absolute path names
> # file:
> data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.glusterfs-1-volume-client-1=0x000000010000000100000000
> trusted.afr.glusterfs-1-volume-client-2=0x000000010000000100000000
> trusted.gfid=0x26c5396c86ff408d9cda106acd2b0768
>
>
trusted.glusterfs.mdata=0x01000000000000000000000000617880a3000000003b2f011700000000617880a3000000003b2f011700000000617880a3000000003983a635
>
> ?[14:36:49] [ssh:root at pve02(192.168.1.51):
> /data/glusterfs/.glusterfs/26/c5 (700)]
> ??># ll
> drwx------ root root   6B 3 days ago  ? ./
> drwx------ root root 8.0K 6 hours ago ? ../
>
> ?[14:36:58] [ssh:root at freya(192.168.1.40):
> /data/glusterfs/.glusterfs/26/c5 (700)]
> ??># ll
> drwx------ root root   6B 3 days ago  ? ./
> drwx------ root root 8.0K 3 hours ago ? ../
>
>
>
> After this, i have disabled the the option you mentioned:
>
> gluster volume set glusterfs-1-volume cluster.eager-lock off
>
> After that I started another healing process manually. Unfortunately
> without success.
>
> @Strahil: For your idea with
> https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/ i need
> more time, maybe i can try it tomorrow. I'll be in touch.
>
> Thanks again and best regards,
> Thorsten
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20211031/4f007dd9/attachment.html>

Thorsten Walk

2021-Oct-31 08:06 UTC

head link

[Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work

Hello,
>It should be enabled alright, but we have noticed some issues of stalelocks (https://github.com/gluster/glusterfs/issues/ {2198, 2211, 2027})
which could prevent self-heal (or any other I/O that takes a blocking lock)
from happening.

I have re-enabled cluster.eager-lock.
>But the problem here is different as you noticed. Thorsten needs to findthe actual file (`find -samefile`) corresponding to this gfid and see what
is the file size, hard-link count etc.) If it is a zero -byte file, then it
should be safe to just delete the file and its hardlink from the brick.

I think, here i need your help :) How i can find the file? I only have the
gfid from the out of 'gluster volume heal glusterfs-1-volume info'
= <gfid:26c5396c-86ff-408d-9cda-106acd2b0768> on Brick 192.168.1.50:
/data/glusterfs.

Thanks and regards,
Thorsten

Am So., 31. Okt. 2021 um 07:35 Uhr schrieb Ravishankar N <
ravishankar.n at pavilion.io>:
>
>
> On Sat, Oct 30, 2021 at 10:47 PM Strahil Nikolov <hunter86_bg at
yahoo.com>
> wrote:
>
>> Hi,
>>
>> based on the output it seems that for some reason the file was deployed
>> locally but not on the 2-nd brick and the arbiter , which for a
'replica 3
>> arbiter 1' (a.k.a replica 2 arbiter 1) is strange.
>>
>> It seems that cluster.eager-lock is enabled as per the virt group:
>>
https://github.com/gluster/glusterfs/blob/devel/extras/group-virt.example
>>
>> @Ravi,
>>
>> do you think that it should not be enabled by default in the virt group
?
>>
>
> It should be enabled alright, but we have noticed some issues of stale
> locks (https://github.com/gluster/glusterfs/issues/ {2198, 2211, 2027})
> which could prevent self-heal (or any other I/O that takes a blocking lock)
> from happening. But the problem here is different as you noticed. Thorsten
> needs to find the actual file (`find -samefile`) corresponding to this gfid
> and see what is the file size, hard-link count etc.) If it is a zero -byte
> file, then it should be safe to just delete the file and its hardlink from
> the brick.
>
> Regards,
> Ravi
>
>
>> Best Regards,
>> Strahil Nikolov
>>
>>
>>
>> On Sat, Oct 30, 2021 at 16:14, Thorsten Walk
>> <darkiop at gmail.com> wrote:
>> Hi Ravi & Strahil, thanks a lot for your answer!
>>
>> The file in the path .glusterfs/26/c5/.. only exists at node1 (=pve01).
>> On node2 (pve02) and the arbiter (freya), the file does not exist:
>>
>>
>>
>> ?[14:35:48] [ssh:root at pve01(192.168.1.50): ~ (700)]
>> ??># getfattr -d -m. -e hex
>>  /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
>> getfattr: Removing leading '/' from absolute path names
>> # file:
>> data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
>> trusted.afr.dirty=0x000000000000000000000000
>> trusted.afr.glusterfs-1-volume-client-1=0x000000010000000100000000
>> trusted.afr.glusterfs-1-volume-client-2=0x000000010000000100000000
>> trusted.gfid=0x26c5396c86ff408d9cda106acd2b0768
>>
>>
trusted.glusterfs.mdata=0x01000000000000000000000000617880a3000000003b2f011700000000617880a3000000003b2f011700000000617880a3000000003983a635
>>
>> ?[14:36:49] [ssh:root at pve02(192.168.1.51):
>> /data/glusterfs/.glusterfs/26/c5 (700)]
>> ??># ll
>> drwx------ root root   6B 3 days ago  ? ./
>> drwx------ root root 8.0K 6 hours ago ? ../
>>
>> ?[14:36:58] [ssh:root at freya(192.168.1.40):
>> /data/glusterfs/.glusterfs/26/c5 (700)]
>> ??># ll
>> drwx------ root root   6B 3 days ago  ? ./
>> drwx------ root root 8.0K 3 hours ago ? ../
>>
>>
>>
>> After this, i have disabled the the option you mentioned:
>>
>> gluster volume set glusterfs-1-volume cluster.eager-lock off
>>
>> After that I started another healing process manually. Unfortunately
>> without success.
>>
>> @Strahil: For your idea with
>> https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/ i need
>> more time, maybe i can try it tomorrow. I'll be in touch.
>>
>> Thanks again and best regards,
>> Thorsten
>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20211031/72ce8628/attachment.html>

Gluster users - Oct 2021 - GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work

[Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work

[Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work