a.schwibbe at gmx.net
2021-May-30 12:12 UTC
[Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed
I am seeking help here after looking for solutions on the web for my distributed-replicated volume. My volume is operated since v3.10 and I upgraded through to 7.9, replaced nodes, replaced bricks without a problem. I love it. Finally I wanted to extend my 6x2 distributed replicated volume with arbiters for better split-brain protection. So I add-brick with replication 3 arbiter 1 (as I had a 6x2 I obviously added 6 arb bricks) and it successfully converted to 6 x (2 +1) and self-heal immideately started. Looking good. Version: 7.9 Number of Bricks: 6 x (2 + 1) = 18 cluster.max-op-version: 70200 Peers: 3 (node[0..2]) Layout |node0 |node1 |node2 |brick0 |brick0 |arbit0 |arbit1 |brick1 |brick1 .... I then recognized that arbiter volumes on node0 & node1 have been healed successfully. Unfortunately all arbiter volumes on node2 have not been healed! I realized that the main dir on my arb mount point has been added (mount point /var/brick/arb_0 now contains dir "brick") however this dir on _all_ other bricks has numeric ID 33, but on this on it has 0). The brick dir on the faulty arb-volumes does contain ".glusterfs", however it has only very few entries. Other than that "brick" is empty. At that point I changed brick dir owner with chown to 33:33 and hoped for self-heal to work. It did not. I hoped a rebalance fix-layout would fix things. It did not. I hoped for a glusterd restart on node2 (as this is happening to both arb volumes on this node exclusively) would help. It did not. Active mount points via nfs-ganesha or fuse continue to work. Existing clients cause errors in the arb-brick logs on node2 for missing files or dirs, but clients seem not affected. r/w operations work. New clients are not able to fuse mount the volume for "authentication error". heal statistics heal-count show several hundred files need healing, this count is rising. Watching df on the arb-brick mount point on node2 shows every now and then a few bytes written, but then removed immideately after that. Any help/recommendation for you highly appreciated. Thank you! A.
Strahil Nikolov
2021-May-31 03:23 UTC
[Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed
Can you provide gluster volume info , gluster volume status and gluster volume heal? <VOLUME> info summary and most probably gluster volume status all clients/client-list Best Regards,Strahil Nikolov On Sun, May 30, 2021 at 15:17, a.schwibbe at gmx.net<a.schwibbe at gmx.net> wrote: I am seeking help here after looking for solutions on the web for my distributed-replicated volume. My volume is operated since v3.10 and I upgraded through to 7.9, replaced nodes, replaced bricks without a problem. I love it. Finally I wanted to extend my 6x2 distributed replicated volume with arbiters for better split-brain protection. So I add-brick with replication 3 arbiter 1 (as I had a 6x2 I obviously added 6 arb bricks) and it successfully converted to 6 x (2 +1) and self-heal immideately started. Looking good. Version: 7.9 Number of Bricks: 6 x (2 + 1) = 18 cluster.max-op-version: 70200 Peers: 3 (node[0..2]) Layout |node0 |node1 |node2 |brick0 |brick0 |arbit0 |arbit1 |brick1 |brick1 .... I then recognized that arbiter volumes on node0 & node1 have been healed successfully. Unfortunately all arbiter volumes on node2 have not been healed! I realized that the main dir on my arb mount point has been added (mount point /var/brick/arb_0 now contains dir "brick") however this dir on _all_ other bricks has numeric ID 33, but on this on it has 0). The brick dir on the faulty arb-volumes does contain ".glusterfs", however it has only very few entries. Other than that "brick" is empty. At that point I changed brick dir owner with chown to 33:33 and hoped for self-heal to work. It did not. I hoped a rebalance fix-layout would fix things. It did not. I hoped for a glusterd restart on node2 (as this is happening to both arb volumes on this node exclusively) would help. It did not. Active mount points via nfs-ganesha or fuse continue to work. Existing clients cause errors in the arb-brick logs on node2 for missing files or dirs, but clients seem not affected. r/w operations work. New clients are not able to fuse mount the volume for "authentication error". heal statistics heal-count show several hundred files need healing, this count is rising. Watching df on the arb-brick mount point on node2 shows every now and then a few bytes written, but then removed immideately after that. Any help/recommendation for you highly appreciated. Thank you! A. ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20210531/98cc6bd7/attachment.html>