Thorsten Walk
2021-Oct-29 06:57 UTC
[Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work
Hello GlusterFS Community, I am using GlusterFS version 9.3 on two Intel NUCs and a Raspberry PI as arbiter for a replicate volume. The whole thing serves me as distributed storage for a Proxmox cluster. I use version 9.3, because I could not find a more recent ARM package for the RPI (= Debian 11). The partions for the volume: NUC1 nvme0n1 259:0 0 465.8G 0 disk ??vg_glusterfs-lv_glusterfs 253:18 0 465.8G 0 lvm /data/glusterfs NUC2 nvme0n1 259:0 0 465.8G 0 disk ??vg_glusterfs-lv_glusterfs 253:14 0 465.8G 0 lvm /data/glusterfs RPI sda 8:0 1 29,8G 0 disk ??sda1 8:1 1 29,8G 0 part /data/glusterfs The volume was created with: mkfs.xfs -f -i size=512 -n size=8192 -d su=128K,sw=10 -L GlusterFS /dev/vg_glusterfs/lv_glusterfs gluster volume create glusterfs-1-volume transport tcp replica 3 arbiter 1 192.168.1.50:/data/glusterfs 192.168.1.51:/data/glusterfs 192.168.1.40:/data/glusterfs force After a certain time it always comes to the state that there are not healable files in the GFS (in the example below: <gfid:26c5396c-86ff-408d-9cda-106acd2b0768>). Currently I have the GlusterFS volume in test mode and only 1-2 VMs running on it. So far there are no negative effects. The replication and the selfheal basically work, only now and then something remains that cannot be healed. Does anyone have an idea how to prevent or heal this? I have already completely rebuilt the volume incl. partitions and glusterd to exclude old loads. If you need more information, please contact me. Thanks a lot! =============== And here is some more info about the volume and the healing attempts:>$ gstatus -abCluster: Status: Healthy GlusterFS: 9.3 Nodes: 3/3 Volumes: 1/1 Volumes: glusterfs-1-volume Replicate Started (UP) - 3/3 Bricks Up - (Arbiter Volume) Capacity: (1.82% used) 8.00 GiB/466.00 GiB (used/total) Self-Heal: 192.168.1.50:/data/glusterfs (1 File(s) to heal). Bricks: Distribute Group 1: 192.168.1.50:/data/glusterfs (Online) 192.168.1.51:/data/glusterfs (Online) 192.168.1.40:/data/glusterfs (Online)>$ gluster volume infoVolume Name: glusterfs-1-volume Type: Replicate Volume ID: f70d9b2c-b30d-4a36-b8ff-249c09c8b45d Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: 192.168.1.50:/data/glusterfs Brick2: 192.168.1.51:/data/glusterfs Brick3: 192.168.1.40:/data/glusterfs (arbiter) Options Reconfigured: cluster.lookup-optimize: off server.keepalive-count: 5 server.keepalive-interval: 2 server.keepalive-time: 10 server.tcp-user-timeout: 20 network.ping-timeout: 20 server.event-threads: 4 client.event-threads: 4 cluster.choose-local: off user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable performance.strict-o-direct: on network.remote-dio: disable performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off cluster.granular-entry-heal: on storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: on performance.client-io-threads: on>$ gluster volume heal glusterfs-1-volumeLaunching heal operation to perform index self heal on volume glusterfs-1-volume has been successful Use heal info commands to check status.>$ gluster volume heal glusterfs-1-volume infoBrick 192.168.1.50:/data/glusterfs <gfid:26c5396c-86ff-408d-9cda-106acd2b0768> Status: Connected Number of entries: 1 Brick 192.168.1.51:/data/glusterfs Status: Connected Number of entries: 0 Brick 192.168.1.40:/data/glusterfs Status: Connected Number of entries: 0>$ gluster volume heal glusterfs-1-volume info split-brainBrick 192.168.1.50:/data/glusterfs Status: Connected Number of entries in split-brain: 0 Brick 192.168.1.51:/data/glusterfs Status: Connected Number of entries in split-brain: 0 Brick 192.168.1.40:/data/glusterfs Status: Connected Number of entries in split-brain: 0 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20211029/3860c1e1/attachment.html>
Ravishankar N
2021-Oct-30 11:36 UTC
[Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work
On Fri, Oct 29, 2021 at 12:28 PM Thorsten Walk <darkiop at gmail.com> wrote:> > After a certain time it always comes to the state that there are not > healable files in the GFS (in the example below: > <gfid:26c5396c-86ff-408d-9cda-106acd2b0768>). >> Currently I have the GlusterFS volume in test mode and only 1-2 VMs > running on it. So far there are no negative effects. The replication and > the selfheal basically work, only now and then something remains that > cannot be healed. > > Does anyone have an idea how to prevent or heal this? I have already > completely rebuilt the volume incl. partitions and glusterd to exclude old > loads. > > If you need more information, please contact me. > >The next time this occurs, can you check if disabling `cluster.eager-lock` helps heal the file? Also share the xattrs (eg.`getfattr -d -m. -e hex /brick-path/.glusterfs/26/c5/26c5396c-86ff-408d-9cda-106acd2b0768 ` ) output from all 3 bricks for the file or its gfid. Regards, Ravi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20211030/e3513fd6/attachment.html>
Strahil Nikolov
2021-Oct-30 12:45 UTC
[Gluster-users] GlusterFS 9.3 - Replicate Volume (2 Bricks / 1 Arbiter) - Self-healing does not always work
Can you find the actual file on the brick via?https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/ ? Usually I use method2 in such cases. Then check the extended attributes on all bricks (inclusing the arbiter): getfattr -d -e hex -m.? /gluster/brick/path/to/filr/or/dir Also, check glustershd.log on 192.168.1.51 & 192.168.1.40 for clues. Best Regards,Strahil Nikolov On Fri, Oct 29, 2021 at 9:58, Thorsten Walk<darkiop at gmail.com> wrote: Hello GlusterFS Community, I am using GlusterFS version 9.3 on two Intel NUCs and a Raspberry PI as arbiter for a replicate volume. The whole thing serves me as distributed storage for a Proxmox cluster. I use version 9.3, because I could not find a more recent ARM package for the RPI (= Debian 11). The partions for the volume: NUC1 nvme0n1 ? ? ? ? ? ? ? ? ? ? ?259:0 ? ?0 465.8G ?0 disk ??vg_glusterfs-lv_glusterfs ?253:18 ? 0 465.8G ?0 lvm ?/data/glusterfs NUC2 nvme0n1 ? ? ? ? ? ? ? ? ? ? ?259:0 ? ?0 465.8G ?0 disk ??vg_glusterfs-lv_glusterfs ?253:14 ? 0 465.8G ?0 lvm ?/data/glusterfs RPI sda ? ? ? ? ? 8:0 ? ?1 29,8G ?0 disk ??sda1 ? ? ? ?8:1 ? ?1 29,8G ?0 part /data/glusterfs The volume was created with: mkfs.xfs -f -i size=512 -n size=8192 -d su=128K,sw=10 -L GlusterFS /dev/vg_glusterfs/lv_glusterfs gluster volume create glusterfs-1-volume transport tcp replica 3 arbiter 1 192.168.1.50:/data/glusterfs 192.168.1.51:/data/glusterfs 192.168.1.40:/data/glusterfs force After a certain time it always comes to the state that there are not healable files in the GFS (in the example below: <gfid:26c5396c-86ff-408d-9cda-106acd2b0768>). Currently I have the GlusterFS volume in test mode and only 1-2 VMs running on it. So far there are no negative effects.?The replication and the selfheal basically work, only now and then something remains that cannot be healed. Does anyone have an idea how to prevent or heal this? I have already completely rebuilt the volume incl. partitions and glusterd to exclude old loads. If you need more information, please contact me. Thanks a lot! =============== And here is some more info about the volume and the healing attempts:>$ gstatus -abCluster: ? ? ? ? ?Status: Healthy ? ? ? ? ? ? ? ? GlusterFS: 9.3 ? ? ? ? ?Nodes: 3/3 ? ? ? ? ? ? ? ? ? ? ?Volumes: 1/1 Volumes: glusterfs-1-volume ? ? ? ? ? ? ? ? Replicate ? ? ? ? ?Started (UP) - 3/3 Bricks Up ?- (Arbiter Volume) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Capacity: (1.82% used) 8.00 GiB/466.00 GiB (used/total) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Self-Heal: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 192.168.1.50:/data/glusterfs (1 File(s) to heal). ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Bricks: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Distribute Group 1: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?192.168.1.50:/data/glusterfs ? (Online) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?192.168.1.51:/data/glusterfs ? (Online) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?192.168.1.40:/data/glusterfs ? (Online) ? ? ? ? ? ?>$ gluster volume infoVolume Name: glusterfs-1-volume Type: Replicate Volume ID: f70d9b2c-b30d-4a36-b8ff-249c09c8b45d Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: 192.168.1.50:/data/glusterfs Brick2: 192.168.1.51:/data/glusterfs Brick3: 192.168.1.40:/data/glusterfs (arbiter) Options Reconfigured: cluster.lookup-optimize: off server.keepalive-count: 5 server.keepalive-interval: 2 server.keepalive-time: 10 server.tcp-user-timeout: 20 network.ping-timeout: 20 server.event-threads: 4 client.event-threads: 4 cluster.choose-local: off user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable performance.strict-o-direct: on network.remote-dio: disable performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off cluster.granular-entry-heal: on storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: on performance.client-io-threads: on>$ gluster volume heal glusterfs-1-volumeLaunching heal operation to perform index self heal on volume glusterfs-1-volume has been successful Use heal info commands to check status.>$ gluster volume heal glusterfs-1-volume infoBrick 192.168.1.50:/data/glusterfs <gfid:26c5396c-86ff-408d-9cda-106acd2b0768> Status: Connected Number of entries: 1 Brick 192.168.1.51:/data/glusterfs Status: Connected Number of entries: 0 Brick 192.168.1.40:/data/glusterfs Status: Connected Number of entries: 0>$ gluster volume heal glusterfs-1-volume info split-brainBrick 192.168.1.50:/data/glusterfs Status: Connected Number of entries in split-brain: 0 Brick 192.168.1.51:/data/glusterfs Status: Connected Number of entries in split-brain: 0 Brick 192.168.1.40:/data/glusterfs Status: Connected Number of entries in split-brain: 0________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20211030/2570543e/attachment.html>