Strahil Nikolov
2021-May-31 09:11 UTC
[Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed
For the arb_0 I seeonly 8 clients , while there should be 12 clients: Brick : 192.168.0.40:/var/bricks/0/brickClients connected : 12 Brick : 192.168.0.41:/var/bricks/0/brickClients connected : 12 Brick : 192.168.0.80:/var/bricks/arb_0/brickClients connected : 8 Can you try to reconnect them. The most simple way is to kill the arbiter process and 'gluster volume start force' , but always verify that you have both data bricks up and running. Yet, this doesn't explain why the heal daemon is not able to replicate properly. Best Regards,Strahil Nikolov Meanwhile I tried reset-brick on one of the failing arbiters on node2, but with same results. The behaviour is reproducible, arbiter stays empty. node0: 192.168.0.40 node1: 192.168.0.41 node3: 192.168.0.80 volume info: Volume Name: gv0 Type: Distributed-Replicate Volume ID: 9bafc4d2-d9b6-4b6d-a631-1cf42d1d2559 Status: Started Snapshot Count: 0 Number of Bricks: 6 x (2 + 1) = 18 Transport-type: tcp Bricks: Brick1: 192.168.0.40:/var/bricks/0/brick Brick2: 192.168.0.41:/var/bricks/0/brick Brick3: 192.168.0.80:/var/bricks/arb_0/brick (arbiter) Brick4: 192.168.0.40:/var/bricks/2/brick Brick5: 192.168.0.80:/var/bricks/2/brick Brick6: 192.168.0.41:/var/bricks/arb_1/brick (arbiter) Brick7: 192.168.0.40:/var/bricks/1/brick Brick8: 192.168.0.41:/var/bricks/1/brick Brick9: 192.168.0.80:/var/bricks/arb_1/brick (arbiter) Brick10: 192.168.0.40:/var/bricks/3/brick Brick11: 192.168.0.80:/var/bricks/3/brick Brick12: 192.168.0.41:/var/bricks/arb_0/brick (arbiter) Brick13: 192.168.0.41:/var/bricks/3/brick Brick14: 192.168.0.80:/var/bricks/4/brick Brick15: 192.168.0.40:/var/bricks/arb_0/brick (arbiter) Brick16: 192.168.0.41:/var/bricks/2/brick Brick17: 192.168.0.80:/var/bricks/5/brick Brick18: 192.168.0.40:/var/bricks/arb_1/brick (arbiter) Options Reconfigured: cluster.min-free-inodes: 6% cluster.min-free-disk: 2% performance.md-cache-timeout: 600 cluster.rebal-throttle: lazy features.scrub-freq: monthly features.scrub-throttle: lazy features.scrub: Inactive features.bitrot: off cluster.server-quorum-type: none performance.cache-refresh-timeout: 10 performance.cache-max-file-size: 64MB performance.cache-size: 781901824 auth.allow: /(192.168.0.*),/usr/andreas(192.168.0.120),/usr/otis(192.168.0.168),/usr/otis(192.168.0.111),/usr/otis(192.168.0.249),/media(192.168.0.*),/virt(192.168.0.*),/cloud(192.168.0.247),/zm(192.168.0.136) performance.cache-invalidation: on performance.stat-prefetch: on features.cache-invalidation-timeout: 600 cluster.quorum-type: auto features.cache-invalidation: on nfs.disable: on transport.address-family: inet cluster.self-heal-daemon: on cluster.server-quorum-ratio: 51% volume status: Status of volume: gv0 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 192.168.0.40:/var/bricks/0/brick 49155 0 Y 713066 Brick 192.168.0.41:/var/bricks/0/brick 49152 0 Y 2082 Brick 192.168.0.80:/var/bricks/arb_0/brick 49152 0 Y 26186 Brick 192.168.0.40:/var/bricks/2/brick 49156 0 Y 713075 Brick 192.168.0.80:/var/bricks/2/brick 49154 0 Y 325 Brick 192.168.0.41:/var/bricks/arb_1/brick 49157 0 Y 1746903 Brick 192.168.0.40:/var/bricks/1/brick 49157 0 Y 713084 Brick 192.168.0.41:/var/bricks/1/brick 49153 0 Y 14104 Brick 192.168.0.80:/var/bricks/arb_1/brick 49159 0 Y 2314 Brick 192.168.0.40:/var/bricks/3/brick 49153 0 Y 2978692 Brick 192.168.0.80:/var/bricks/3/brick 49155 0 Y 23269 Brick 192.168.0.41:/var/bricks/arb_0/brick 49158 0 Y 1746942 Brick 192.168.0.41:/var/bricks/3/brick 49155 0 Y 897058 Brick 192.168.0.80:/var/bricks/4/brick 49156 0 Y 27433 Brick 192.168.0.40:/var/bricks/arb_0/brick 49152 0 Y 3561115 Brick 192.168.0.41:/var/bricks/2/brick 49156 0 Y 902602 Brick 192.168.0.80:/var/bricks/5/brick 49157 0 Y 29522 Brick 192.168.0.40:/var/bricks/arb_1/brick 49154 0 Y 3561159 Self-heal Daemon on localhost N/A N/A Y 26199 Self-heal Daemon on 192.168.0.41 N/A N/A Y 2240635 Self-heal Daemon on 192.168.0.40 N/A N/A Y 3912810 Task Status of Volume gv0 ------------------------------------------------------------------------------ There are no active volume tasks volume heal info summary: Brick 192.168.0.40:/var/bricks/0/brick <--- contains 100177 files in 25015 dirs Status: Connected Total Number of entries: 1006 Number of entries in heal pending: 1006 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.41:/var/bricks/0/brick Status: Connected Total Number of entries: 1006 Number of entries in heal pending: 1006 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.80:/var/bricks/arb_0/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.40:/var/bricks/2/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.80:/var/bricks/2/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.41:/var/bricks/arb_1/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.40:/var/bricks/1/brick Status: Connected Total Number of entries: 1006 Number of entries in heal pending: 1006 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.41:/var/bricks/1/brick Status: Connected Total Number of entries: 1006 Number of entries in heal pending: 1006 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.80:/var/bricks/arb_1/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.40:/var/bricks/3/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.80:/var/bricks/3/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.41:/var/bricks/arb_0/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.41:/var/bricks/3/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.80:/var/bricks/4/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.40:/var/bricks/arb_0/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.41:/var/bricks/2/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.80:/var/bricks/5/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.40:/var/bricks/arb_1/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 client-list: Client connections for volume gv0 Name count ----- ------ fuse 5 gfapi.ganesha.nfsd 3 glustershd 3 total clients for volume gv0 : 11 ----------------------------------------------------------------- all clients: https://pro.hostit.de/nextcloud/index.php/s/tWdHox3aqb3qqbG failing mnt.log https://pro.hostit.de/nextcloud/index.php/s/2E2NLnXNsTy7EQe Thank you. A. "Strahil Nikolov" hunter86_bg at yahoo.com ? 31. Mai 2021 05:23> Can you provide gluster volume info , gluster volume status and gluster volume heal info summary and most probably gluster volume status all clients/client-list > > > Best Regards, > Strahil Nikolov > > > On Sun, May 30, 2021 at 15:17, a.schwibbe at gmx.net > > wrote: > > > > I am seeking help here after looking for solutions on the web for my distributed-replicated volume. > > > > My volume is operated since v3.10 and I upgraded through to 7.9, replaced nodes, replaced bricks without a problem. I love it. > > > > Finally I wanted to extend my 6x2 distributed replicated volume with arbiters for better split-brain protection. > > > > > > So I add-brick with replication 3 arbiter 1 (as I had a 6x2 I obviously added 6 arb bricks) and it successfully converted to 6 x (2 +1) and self-heal immideately started. Looking good. > > > > > > > > Version: 7.9 > > > > > > Number of Bricks: 6 x (2 + 1) = 18 > > > > > > cluster.max-op-version: 70200 > > > > > > Peers: 3 (node[0..2]) > > > > > > Layout > > > > > > |node0 |node1 |node2 > > > > |brick0 |brick0 |arbit0 > > > > > > |arbit1 |brick1 |brick1 > > > > > > .... > > > > > > > > I then recognized that arbiter volumes on node0 & node1 have been healed successfully. > > > > Unfortunately all arbiter volumes on node2 have not been healed! > > > > I realized that the main dir on my arb mount point has been added (mount point /var/brick/arb_0 now contains dir "brick") however this dir on _all_ other bricks has numeric ID 33, but on this on it has 0). The brick dir on the faulty arb-volumes does contain ".glusterfs", however it has only very few entries. Other than that "brick" is empty. > > > > At that point I changed brick dir owner with chown to 33:33 and hoped for self-heal to work. It did not. > > > > I hoped a rebalance fix-layout would fix things. It did not. > > > > I hoped for a glusterd restart on node2 (as this is happening to both arb volumes on this node exclusively) would help. It did not. > > > > > > Active mount points via nfs-ganesha or fuse continue to work. > > > > Existing clients cause errors in the arb-brick logs on node2 for missing files or dirs, but clients seem not affected. r/w operations work. > > > > > > New clients are not able to fuse mount the volume for "authentication error". > > > > heal statistics heal-count show several hundred files need healing, this count is rising. Watching df on the arb-brick mount point on node2 shows every now and then a few bytes written, but then removed immideately after that. > > > > > > Any help/recommendation for you highly appreciated. > > > > Thank you! > > > > > > A. > > > > ________ > > > > > > > > > > > > Community Meeting Calendar: > > > > > > Schedule - > > > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > > > Bridge: https://meet.google.com/cpu-eiue-hvk > > > > Gluster-users mailing list > > > > Gluster-users at gluster.org > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20210531/f32afa6a/attachment.html>
a.schwibbe at gmx.net
2021-May-31 09:28 UTC
[Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed
Thanks Strahil, unfortunately I cannot connect as the mount is denied as in mount.log provided. IPs > n.n.n..100 are clients and simply cannot mount the volume. When killing the arb pids on node2 new clients can mount the volume. When bringing them up again I experience the same problem. I wonder why the root dir on the arb bricks has wrong UID:GID. I added regular data bricks before without any problems on node2. Also when executing "watch df" I see /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0 .. /dev/md50 11700224 33128 11667096 1% /var/bricks/arb_0 .. /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0 So heal daemon might try to do something, which isn't working. Thus I chowned UID:GID of ../arb_0/brick manually to match, but it did not work either. As I added all 6 arbs at once and 4 are working as expected I really don't get what's wrong with these... A. "Strahil Nikolov" hunter86_bg at yahoo.com ? 31. Mai 2021 11:12> For the arb_0 I seeonly 8 clients , while there should be 12 clients: > Brick : 192.168.0.40:/var/bricks/0/brick > Clients connected : 12 > > Brick : 192.168.0.41:/var/bricks/0/brick > Clients connected : 12 > > Brick : 192.168.0.80:/var/bricks/arb_0/brick > Clients connected : 8 > > Can you try to reconnect them. The most simple way is to kill the arbiter process and 'gluster volume start force' , but always verify that you have both data bricks up and running. > > > > Yet, this doesn't explain why the heal daemon is not able to replicate properly. > > > > Best Regards, > Strahil Nikolov > > > > Meanwhile I tried reset-brick on one of the failing arbiters on node2, but with same results. The behaviour is reproducible, arbiter stays empty. > > > > node0: 192.168.0.40 > > > > node1: 192.168.0.41 > > > > node3: 192.168.0.80 > > > > volume info: > > > > Volume Name: gv0 > > Type: Distributed-Replicate > > Volume ID: 9bafc4d2-d9b6-4b6d-a631-1cf42d1d2559 > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 6 x (2 + 1) = 18 > > Transport-type: tcp > > Bricks: > > Brick1: 192.168.0.40:/var/bricks/0/brick > > Brick2: 192.168.0.41:/var/bricks/0/brick > > Brick3: 192.168.0.80:/var/bricks/arb_0/brick (arbiter) > > Brick4: 192.168.0.40:/var/bricks/2/brick > > Brick5: 192.168.0.80:/var/bricks/2/brick > > Brick6: 192.168.0.41:/var/bricks/arb_1/brick (arbiter) > > Brick7: 192.168.0.40:/var/bricks/1/brick > > Brick8: 192.168.0.41:/var/bricks/1/brick > > Brick9: 192.168.0.80:/var/bricks/arb_1/brick (arbiter) > > Brick10: 192.168.0.40:/var/bricks/3/brick > > Brick11: 192.168.0.80:/var/bricks/3/brick > > Brick12: 192.168.0.41:/var/bricks/arb_0/brick (arbiter) > > Brick13: 192.168.0.41:/var/bricks/3/brick > > Brick14: 192.168.0.80:/var/bricks/4/brick > > Brick15: 192.168.0.40:/var/bricks/arb_0/brick (arbiter) > > Brick16: 192.168.0.41:/var/bricks/2/brick > > Brick17: 192.168.0.80:/var/bricks/5/brick > > Brick18: 192.168.0.40:/var/bricks/arb_1/brick (arbiter) > > Options Reconfigured: > > cluster.min-free-inodes: 6% > > cluster.min-free-disk: 2% > > performance.md-cache-timeout: 600 > > cluster.rebal-throttle: lazy > > features.scrub-freq: monthly > > features.scrub-throttle: lazy > > features.scrub: Inactive > > features.bitrot: off > > cluster.server-quorum-type: none > > performance.cache-refresh-timeout: 10 > > performance.cache-max-file-size: 64MB > > performance.cache-size: 781901824 > > auth.allow: /(192.168.0.*),/usr/andreas(192.168.0.120),/usr/otis(192.168.0.168),/usr/otis(192.168.0.111),/usr/otis(192.168.0.249),/media(192.168.0.*),/virt(192.168.0.*),/cloud(192.168.0.247),/zm(192.168.0.136) > > performance.cache-invalidation: on > > performance.stat-prefetch: on > > features.cache-invalidation-timeout: 600 > > cluster.quorum-type: auto > > features.cache-invalidation: on > > nfs.disable: on > > transport.address-family: inet > > cluster.self-heal-daemon: on > > cluster.server-quorum-ratio: 51% > > > > volume status: > > > > Status of volume: gv0 > > Gluster process TCP Port RDMA Port Online Pid > > ------------------------------------------------------------------------------ > > Brick 192.168.0.40:/var/bricks/0/brick 49155 0 Y 713066 > > Brick 192.168.0.41:/var/bricks/0/brick 49152 0 Y 2082 > > Brick 192.168.0.80:/var/bricks/arb_0/brick 49152 0 Y 26186 > > Brick 192.168.0.40:/var/bricks/2/brick 49156 0 Y 713075 > > Brick 192.168.0.80:/var/bricks/2/brick 49154 0 Y 325 > > Brick 192.168.0.41:/var/bricks/arb_1/brick 49157 0 Y 1746903 > > Brick 192.168.0.40:/var/bricks/1/brick 49157 0 Y 713084 > > Brick 192.168.0.41:/var/bricks/1/brick 49153 0 Y 14104 > > Brick 192.168.0.80:/var/bricks/arb_1/brick 49159 0 Y 2314 > > Brick 192.168.0.40:/var/bricks/3/brick 49153 0 Y 2978692 > > Brick 192.168.0.80:/var/bricks/3/brick 49155 0 Y 23269 > > Brick 192.168.0.41:/var/bricks/arb_0/brick 49158 0 Y 1746942 > > Brick 192.168.0.41:/var/bricks/3/brick 49155 0 Y 897058 > > Brick 192.168.0.80:/var/bricks/4/brick 49156 0 Y 27433 > > Brick 192.168.0.40:/var/bricks/arb_0/brick 49152 0 Y 3561115 > > Brick 192.168.0.41:/var/bricks/2/brick 49156 0 Y 902602 > > Brick 192.168.0.80:/var/bricks/5/brick 49157 0 Y 29522 > > Brick 192.168.0.40:/var/bricks/arb_1/brick 49154 0 Y 3561159 > > Self-heal Daemon on localhost N/A N/A Y 26199 > > Self-heal Daemon on 192.168.0.41 N/A N/A Y 2240635 > > Self-heal Daemon on 192.168.0.40 N/A N/A Y 3912810 > > > > Task Status of Volume gv0 > > ------------------------------------------------------------------------------ > > There are no active volume tasks > > > > volume heal info summary: > > > > Brick 192.168.0.40:/var/bricks/0/brick <--- contains 100177 files in 25015 dirs > > Status: Connected > > Total Number of entries: 1006 > > Number of entries in heal pending: 1006 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > Brick 192.168.0.41:/var/bricks/0/brick > > Status: Connected > > Total Number of entries: 1006 > > Number of entries in heal pending: 1006 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > Brick 192.168.0.80:/var/bricks/arb_0/brick > > Status: Connected > > Total Number of entries: 0 > > Number of entries in heal pending: 0 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > Brick 192.168.0.40:/var/bricks/2/brick > > Status: Connected > > Total Number of entries: 0 > > Number of entries in heal pending: 0 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > Brick 192.168.0.80:/var/bricks/2/brick > > Status: Connected > > Total Number of entries: 0 > > Number of entries in heal pending: 0 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > Brick 192.168.0.41:/var/bricks/arb_1/brick > > Status: Connected > > Total Number of entries: 0 > > Number of entries in heal pending: 0 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > Brick 192.168.0.40:/var/bricks/1/brick > > Status: Connected > > Total Number of entries: 1006 > > Number of entries in heal pending: 1006 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > Brick 192.168.0.41:/var/bricks/1/brick > > Status: Connected > > Total Number of entries: 1006 > > Number of entries in heal pending: 1006 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > Brick 192.168.0.80:/var/bricks/arb_1/brick > > Status: Connected > > Total Number of entries: 0 > > Number of entries in heal pending: 0 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > Brick 192.168.0.40:/var/bricks/3/brick > > Status: Connected > > Total Number of entries: 0 > > Number of entries in heal pending: 0 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > Brick 192.168.0.80:/var/bricks/3/brick > > Status: Connected > > Total Number of entries: 0 > > Number of entries in heal pending: 0 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > Brick 192.168.0.41:/var/bricks/arb_0/brick > > Status: Connected > > Total Number of entries: 0 > > Number of entries in heal pending: 0 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > Brick 192.168.0.41:/var/bricks/3/brick > > Status: Connected > > Total Number of entries: 0 > > Number of entries in heal pending: 0 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > Brick 192.168.0.80:/var/bricks/4/brick > > Status: Connected > > Total Number of entries: 0 > > Number of entries in heal pending: 0 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > Brick 192.168.0.40:/var/bricks/arb_0/brick > > Status: Connected > > Total Number of entries: 0 > > Number of entries in heal pending: 0 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > Brick 192.168.0.41:/var/bricks/2/brick > > Status: Connected > > Total Number of entries: 0 > > Number of entries in heal pending: 0 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > Brick 192.168.0.80:/var/bricks/5/brick > > Status: Connected > > Total Number of entries: 0 > > Number of entries in heal pending: 0 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > Brick 192.168.0.40:/var/bricks/arb_1/brick > > Status: Connected > > Total Number of entries: 0 > > Number of entries in heal pending: 0 > > Number of entries in split-brain: 0 > > Number of entries possibly healing: 0 > > > > client-list: > > > > Client connections for volume gv0 > > Name count > > ----- ------ > > fuse 5 > > gfapi.ganesha.nfsd 3 > > glustershd 3 > > > > total clients for volume gv0 : 11 > > ----------------------------------------------------------------- > > > > all clients: https://pro.hostit.de/nextcloud/index.php/s/tWdHox3aqb3qqbG > > > > failing mnt.log https://pro.hostit.de/nextcloud/index.php/s/2E2NLnXNsTy7EQe > > > > Thank you. > > > > A. > > > > "Strahil Nikolov" hunter86_bg at yahoo.com ? 31. Mai 2021 05:23 > > > Can you provide gluster volume info , gluster volume status and gluster volume heal info summary and most probably gluster volume status all clients/client-list > > > > > > > > > Best Regards, > > > Strahil Nikolov > > > > > > > On Sun, May 30, 2021 at 15:17, a.schwibbe at gmx.net > > > > wrote: > > > > > > > > I am seeking help here after looking for solutions on the web for my distributed-replicated volume. > > > > > > > > My volume is operated since v3.10 and I upgraded through to 7.9, replaced nodes, replaced bricks without a problem. I love it. > > > > > > > > Finally I wanted to extend my 6x2 distributed replicated volume with arbiters for better split-brain protection. > > > > > > > > > > > > So I add-brick with replication 3 arbiter 1 (as I had a 6x2 I obviously added 6 arb bricks) and it successfully converted to 6 x (2 +1) and self-heal immideately started. Looking good. > > > > > > > > > > > > > > > > Version: 7.9 > > > > > > > > > > > > Number of Bricks: 6 x (2 + 1) = 18 > > > > > > > > > > > > cluster.max-op-version: 70200 > > > > > > > > > > > > Peers: 3 (node[0..2]) > > > > > > > > > > > > Layout > > > > > > > > > > > > |node0 |node1 |node2 > > > > > > > > |brick0 |brick0 |arbit0 > > > > > > > > > > > > |arbit1 |brick1 |brick1 > > > > > > > > > > > > .... > > > > > > > > > > > > > > > > I then recognized that arbiter volumes on node0 & node1 have been healed successfully. > > > > > > > > Unfortunately all arbiter volumes on node2 have not been healed! > > > > > > > > I realized that the main dir on my arb mount point has been added (mount point /var/brick/arb_0 now contains dir "brick") however this dir on _all_ other bricks has numeric ID 33, but on this on it has 0). The brick dir on the faulty arb-volumes does contain ".glusterfs", however it has only very few entries. Other than that "brick" is empty. > > > > > > > > At that point I changed brick dir owner with chown to 33:33 and hoped for self-heal to work. It did not. > > > > > > > > I hoped a rebalance fix-layout would fix things. It did not. > > > > > > > > I hoped for a glusterd restart on node2 (as this is happening to both arb volumes on this node exclusively) would help. It did not. > > > > > > > > > > > > Active mount points via nfs-ganesha or fuse continue to work. > > > > > > > > Existing clients cause errors in the arb-brick logs on node2 for missing files or dirs, but clients seem not affected. r/w operations work. > > > > > > > > > > > > New clients are not able to fuse mount the volume for "authentication error". > > > > > > > > heal statistics heal-count show several hundred files need healing, this count is rising. Watching df on the arb-brick mount point on node2 shows every now and then a few bytes written, but then removed immideately after that. > > > > > > > > > > > > Any help/recommendation for you highly appreciated. > > > > > > > > Thank you! > > > > > > > > > > > > A. > > > > > > > > ________ > > > > > > > > > > > > > > > > > > > > > > > > Community Meeting Calendar: > > > > > > > > > > > > Schedule - > > > > > > > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > > > > > > > Bridge: https://meet.google.com/cpu-eiue-hvk > > > > > > > > Gluster-users mailing list > > > > > > > > Gluster-users at gluster.org > > > > > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > >