thr3ads.net - Gluster users - [Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work [Oct 2021]

If this information is useful, please help other people find it:
Share via:

Ravishankar N

2021-Oct-30 11:36 UTC

[Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work

On Fri, Oct 29, 2021 at 12:28 PM Thorsten Walk <darkiop at gmail.com>
wrote:
>
> After a certain time it always comes to the state that there are not
> healable files in the GFS (in the example below:
> <gfid:26c5396c-86ff-408d-9cda-106acd2b0768>).
>
> Currently I have the GlusterFS volume in test mode and only 1-2 VMs
> running on it. So far there are no negative effects. The replication and
> the selfheal basically work, only now and then something remains that
> cannot be healed.
>
> Does anyone have an idea how to prevent or heal this? I have already
> completely rebuilt the volume incl. partitions and glusterd to exclude old
> loads.
>
> If you need more information, please contact me.
>
>The next time this occurs, can you check if disabling `cluster.eager-lock`
helps heal the file?  Also share the xattrs (eg.`getfattr -d -m. -e hex
/brick-path/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768 ` )
output from all 3 bricks for the file or its gfid.

Regards,
Ravi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20211030/e3513fd6/attachment.html>

Thorsten Walk

2021-Oct-30 13:13 UTC

head link

[Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work

Hi Ravi & Strahil, thanks a lot for your answer!

The file in the path .glusterfs/26/c5/.. only exists at node1 (=pve01). On
node2 (pve02) and the arbiter (freya), the file does not exist:



?[14:35:48] [ssh:root at pve01(192.168.1.50): ~ (700)]
??># getfattr -d -m. -e hex
 /data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
getfattr: Removing leading '/' from absolute path names
# file: data/glusterfs/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.glusterfs-1-volume-client-1=0x000000010000000100000000
trusted.afr.glusterfs-1-volume-client-2=0x000000010000000100000000
trusted.gfid=0x26c5396c86ff408d9cda106acd2b0768
trusted.glusterfs.mdata=0x01000000000000000000000000617880a3000000003b2f011700000000617880a3000000003b2f011700000000617880a3000000003983a635

?[14:36:49] [ssh:root at pve02(192.168.1.51): /data/glusterfs/.glusterfs/26/c5
(700)]
??># ll
drwx------ root root   6B 3 days ago  ? ./
drwx------ root root 8.0K 6 hours ago ? ../

?[14:36:58] [ssh:root at freya(192.168.1.40): /data/glusterfs/.glusterfs/26/c5
(700)]
??># ll
drwx------ root root   6B 3 days ago  ? ./
drwx------ root root 8.0K 3 hours ago ? ../



After this, i have disabled the the option you mentioned:

gluster volume set glusterfs-1-volume cluster.eager-lock off

After that I started another healing process manually. Unfortunately
without success.

@Strahil: For your idea with
https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/ i need
more time, maybe i can try it tomorrow. I'll be in touch.

Thanks again and best regards,
Thorsten
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20211030/a4e68921/attachment.html>

Gluster users - Oct 2021 - GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work

[Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work

[Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work