thr3ads.net - Gluster users - [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed [May 2021]

If this information is useful, please help other people find it:
Share via:

a.schwibbe at gmx.net

2021-May-31 11:44 UTC

[Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed

Ok, will do.


working arbiter:

ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 13 33 33 146 Mai 29 22:38
brick

ls- lna /var/bricks/arb_0/brick >>> drw------- 262 0 0 8192 Mai 29
22:38 .glusterfs
+ all data-brick dirs ...


affected arbiter:

ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 3 0 0 24 Mai 30 16:23 brick
ls -lna /var/bricks/arb_0/brick >>> drw------- 7 0 0 99 Mai 30 16:23
.glusterfs
nothing else here


find /var/bricks/arb_0/brick -not -user 33 -print

/var/bricks/arb_0/brick/.glusterfs
/var/bricks/arb_0/brick/.glusterfs/indices
/var/bricks/arb_0/brick/.glusterfs/indices/xattrop
/var/bricks/arb_0/brick/.glusterfs/indices/dirty
/var/bricks/arb_0/brick/.glusterfs/indices/entry-changes
/var/bricks/arb_0/brick/.glusterfs/changelogs
/var/bricks/arb_0/brick/.glusterfs/changelogs/htime
/var/bricks/arb_0/brick/.glusterfs/changelogs/csnap
/var/bricks/arb_0/brick/.glusterfs/00
/var/bricks/arb_0/brick/.glusterfs/00/00
/var/bricks/arb_0/brick/.glusterfs/00/00/00000000-0000-0000-0000-000000000001
/var/bricks/arb_0/brick/.glusterfs/landfill
/var/bricks/arb_0/brick/.glusterfs/unlink
/var/bricks/arb_0/brick/.glusterfs/health_check

find /var/bricks/arb_0/brick -not -user 33 -print

/var/bricks/arb_0/brick/.glusterfs
/var/bricks/arb_0/brick/.glusterfs/indices
/var/bricks/arb_0/brick/.glusterfs/indices/xattrop
/var/bricks/arb_0/brick/.glusterfs/indices/dirty
/var/bricks/arb_0/brick/.glusterfs/indices/entry-changes
/var/bricks/arb_0/brick/.glusterfs/changelogs
/var/bricks/arb_0/brick/.glusterfs/changelogs/htime
/var/bricks/arb_0/brick/.glusterfs/changelogs/csnap
/var/bricks/arb_0/brick/.glusterfs/00
/var/bricks/arb_0/brick/.glusterfs/00/00
/var/bricks/arb_0/brick/.glusterfs/00/00/00000000-0000-0000-0000-000000000001
/var/bricks/arb_0/brick/.glusterfs/landfill
/var/bricks/arb_0/brick/.glusterfs/unlink
/var/bricks/arb_0/brick/.glusterfs/health_check

Output is identical to user:group 36 as all these have UID:GID 0:0, but these
files have 0:0 also on the working arbiters.
And this is all files/dirs that exist on the affected arbs. Nothing more on it.
There should be much more, but this seems to missing self heal.

Thanks.

A.


"Strahil Nikolov" hunter86_bg at yahoo.com ? 31. Mai 2021
13:11> Hi,
>
> I think that the best way is to go through the logs on the affected arbiter
brick (maybe even temporarily increase the log level).
>
> What is the output of:
>
> find /var/brick/arb_0/brick -not -user 36 -print
> find /var/brick/arb_0/brick -not group 36 -print
>
> Maybe there are some files/dirs that are with wrong ownership.
>
> Best Regards,
> Strahil Nikolov
>
> >
> > Thanks Strahil,
> >
> > unfortunately I cannot connect as the mount is denied as in mount.log
provided.
> > IPs > n.n.n..100 are clients and simply cannot mount the volume.
When killing the arb pids on node2 new clients can mount the volume. When
bringing them up again I experience the same problem.
> >
> > I wonder why the root dir on the arb bricks has wrong UID:GID.
> > I added regular data bricks before without any problems on node2.
> >
> > Also when executing "watch df"
> >
> > I see
> >
> > /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0
> > ..
> >
> > /dev/md50 11700224 33128 11667096 1% /var/bricks/arb_0
> >
> > ..
> >
> > /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0
> >
> > So heal daemon might try to do something, which isn't working.
Thus I chowned UID:GID of ../arb_0/brick manually to match, but it did not work
either.
> >
> > As I added all 6 arbs at once and 4 are working as expected I really
don't get what's wrong with these...
> >
> > A.
> >
> > "Strahil Nikolov" hunter86_bg at yahoo.com ? 31. Mai 2021
11:12
> > > For the arb_0 I seeonly 8 clients , while there should be 12
clients:
> > > Brick : 192.168.0.40:/var/bricks/0/brick
> > > Clients connected : 12
> > >
> > > Brick : 192.168.0.41:/var/bricks/0/brick
> > > Clients connected : 12
> > >
> > > Brick : 192.168.0.80:/var/bricks/arb_0/brick
> > > Clients connected : 8
> > >
> > > Can you try to reconnect them. The most simple way is to kill the
arbiter process and 'gluster volume start force' , but always verify
that you have both data bricks up and running.
> > >
> > >
> > >
> > > Yet, this doesn't explain why the heal daemon is not able to
replicate properly.
> > >
> > >
> > >
> > > Best Regards,
> > > Strahil Nikolov
> > > >
> > > > Meanwhile I tried reset-brick on one of the failing arbiters
on node2, but with same results. The behaviour is reproducible, arbiter stays
empty.
> > > >
> > > > node0: 192.168.0.40
> > > >
> > > > node1: 192.168.0.41
> > > >
> > > > node3: 192.168.0.80
> > > >
> > > > volume info:
> > > >
> > > > Volume Name: gv0
> > > > Type: Distributed-Replicate
> > > > Volume ID: 9bafc4d2-d9b6-4b6d-a631-1cf42d1d2559
> > > > Status: Started
> > > > Snapshot Count: 0
> > > > Number of Bricks: 6 x (2 + 1) = 18
> > > > Transport-type: tcp
> > > > Bricks:
> > > > Brick1: 192.168.0.40:/var/bricks/0/brick
> > > > Brick2: 192.168.0.41:/var/bricks/0/brick
> > > > Brick3: 192.168.0.80:/var/bricks/arb_0/brick (arbiter)
> > > > Brick4: 192.168.0.40:/var/bricks/2/brick
> > > > Brick5: 192.168.0.80:/var/bricks/2/brick
> > > > Brick6: 192.168.0.41:/var/bricks/arb_1/brick (arbiter)
> > > > Brick7: 192.168.0.40:/var/bricks/1/brick
> > > > Brick8: 192.168.0.41:/var/bricks/1/brick
> > > > Brick9: 192.168.0.80:/var/bricks/arb_1/brick (arbiter)
> > > > Brick10: 192.168.0.40:/var/bricks/3/brick
> > > > Brick11: 192.168.0.80:/var/bricks/3/brick
> > > > Brick12: 192.168.0.41:/var/bricks/arb_0/brick (arbiter)
> > > > Brick13: 192.168.0.41:/var/bricks/3/brick
> > > > Brick14: 192.168.0.80:/var/bricks/4/brick
> > > > Brick15: 192.168.0.40:/var/bricks/arb_0/brick (arbiter)
> > > > Brick16: 192.168.0.41:/var/bricks/2/brick
> > > > Brick17: 192.168.0.80:/var/bricks/5/brick
> > > > Brick18: 192.168.0.40:/var/bricks/arb_1/brick (arbiter)
> > > > Options Reconfigured:
> > > > cluster.min-free-inodes: 6%
> > > > cluster.min-free-disk: 2%
> > > > performance.md-cache-timeout: 600
> > > > cluster.rebal-throttle: lazy
> > > > features.scrub-freq: monthly
> > > > features.scrub-throttle: lazy
> > > > features.scrub: Inactive
> > > > features.bitrot: off
> > > > cluster.server-quorum-type: none
> > > > performance.cache-refresh-timeout: 10
> > > > performance.cache-max-file-size: 64MB
> > > > performance.cache-size: 781901824
> > > > auth.allow:
/(192.168.0.*),/usr/andreas(192.168.0.120),/usr/otis(192.168.0.168),/usr/otis(192.168.0.111),/usr/otis(192.168.0.249),/media(192.168.0.*),/virt(192.168.0.*),/cloud(192.168.0.247),/zm(192.168.0.136)
> > > > performance.cache-invalidation: on
> > > > performance.stat-prefetch: on
> > > > features.cache-invalidation-timeout: 600
> > > > cluster.quorum-type: auto
> > > > features.cache-invalidation: on
> > > > nfs.disable: on
> > > > transport.address-family: inet
> > > > cluster.self-heal-daemon: on
> > > > cluster.server-quorum-ratio: 51%
> > > >
> > > > volume status:
> > > >
> > > > Status of volume: gv0
> > > > Gluster process TCP Port RDMA Port Online Pid
> > > >
------------------------------------------------------------------------------
> > > > Brick 192.168.0.40:/var/bricks/0/brick 49155 0 Y 713066
> > > > Brick 192.168.0.41:/var/bricks/0/brick 49152 0 Y 2082
> > > > Brick 192.168.0.80:/var/bricks/arb_0/brick 49152 0 Y 26186
> > > > Brick 192.168.0.40:/var/bricks/2/brick 49156 0 Y 713075
> > > > Brick 192.168.0.80:/var/bricks/2/brick 49154 0 Y 325
> > > > Brick 192.168.0.41:/var/bricks/arb_1/brick 49157 0 Y 1746903
> > > > Brick 192.168.0.40:/var/bricks/1/brick 49157 0 Y 713084
> > > > Brick 192.168.0.41:/var/bricks/1/brick 49153 0 Y 14104
> > > > Brick 192.168.0.80:/var/bricks/arb_1/brick 49159 0 Y 2314
> > > > Brick 192.168.0.40:/var/bricks/3/brick 49153 0 Y 2978692
> > > > Brick 192.168.0.80:/var/bricks/3/brick 49155 0 Y 23269
> > > > Brick 192.168.0.41:/var/bricks/arb_0/brick 49158 0 Y 1746942
> > > > Brick 192.168.0.41:/var/bricks/3/brick 49155 0 Y 897058
> > > > Brick 192.168.0.80:/var/bricks/4/brick 49156 0 Y 27433
> > > > Brick 192.168.0.40:/var/bricks/arb_0/brick 49152 0 Y 3561115
> > > > Brick 192.168.0.41:/var/bricks/2/brick 49156 0 Y 902602
> > > > Brick 192.168.0.80:/var/bricks/5/brick 49157 0 Y 29522
> > > > Brick 192.168.0.40:/var/bricks/arb_1/brick 49154 0 Y 3561159
> > > > Self-heal Daemon on localhost N/A N/A Y 26199
> > > > Self-heal Daemon on 192.168.0.41 N/A N/A Y 2240635
> > > > Self-heal Daemon on 192.168.0.40 N/A N/A Y 3912810
> > > >
> > > > Task Status of Volume gv0
> > > >
------------------------------------------------------------------------------
> > > > There are no active volume tasks
> > > >
> > > > volume heal info summary:
> > > >
> > > > Brick 192.168.0.40:/var/bricks/0/brick <--- contains
100177 files in 25015 dirs
> > > > Status: Connected
> > > > Total Number of entries: 1006
> > > > Number of entries in heal pending: 1006
> > > > Number of entries in split-brain: 0
> > > > Number of entries possibly healing: 0
> > > >
> > > > Brick 192.168.0.41:/var/bricks/0/brick
> > > > Status: Connected
> > > > Total Number of entries: 1006
> > > > Number of entries in heal pending: 1006
> > > > Number of entries in split-brain: 0
> > > > Number of entries possibly healing: 0
> > > >
> > > > Brick 192.168.0.80:/var/bricks/arb_0/brick
> > > > Status: Connected
> > > > Total Number of entries: 0
> > > > Number of entries in heal pending: 0
> > > > Number of entries in split-brain: 0
> > > > Number of entries possibly healing: 0
> > > >
> > > > Brick 192.168.0.40:/var/bricks/2/brick
> > > > Status: Connected
> > > > Total Number of entries: 0
> > > > Number of entries in heal pending: 0
> > > > Number of entries in split-brain: 0
> > > > Number of entries possibly healing: 0
> > > >
> > > > Brick 192.168.0.80:/var/bricks/2/brick
> > > > Status: Connected
> > > > Total Number of entries: 0
> > > > Number of entries in heal pending: 0
> > > > Number of entries in split-brain: 0
> > > > Number of entries possibly healing: 0
> > > >
> > > > Brick 192.168.0.41:/var/bricks/arb_1/brick
> > > > Status: Connected
> > > > Total Number of entries: 0
> > > > Number of entries in heal pending: 0
> > > > Number of entries in split-brain: 0
> > > > Number of entries possibly healing: 0
> > > >
> > > > Brick 192.168.0.40:/var/bricks/1/brick
> > > > Status: Connected
> > > > Total Number of entries: 1006
> > > > Number of entries in heal pending: 1006
> > > > Number of entries in split-brain: 0
> > > > Number of entries possibly healing: 0
> > > >
> > > > Brick 192.168.0.41:/var/bricks/1/brick
> > > > Status: Connected
> > > > Total Number of entries: 1006
> > > > Number of entries in heal pending: 1006
> > > > Number of entries in split-brain: 0
> > > > Number of entries possibly healing: 0
> > > >
> > > > Brick 192.168.0.80:/var/bricks/arb_1/brick
> > > > Status: Connected
> > > > Total Number of entries: 0
> > > > Number of entries in heal pending: 0
> > > > Number of entries in split-brain: 0
> > > > Number of entries possibly healing: 0
> > > >
> > > > Brick 192.168.0.40:/var/bricks/3/brick
> > > > Status: Connected
> > > > Total Number of entries: 0
> > > > Number of entries in heal pending: 0
> > > > Number of entries in split-brain: 0
> > > > Number of entries possibly healing: 0
> > > >
> > > > Brick 192.168.0.80:/var/bricks/3/brick
> > > > Status: Connected
> > > > Total Number of entries: 0
> > > > Number of entries in heal pending: 0
> > > > Number of entries in split-brain: 0
> > > > Number of entries possibly healing: 0
> > > >
> > > > Brick 192.168.0.41:/var/bricks/arb_0/brick
> > > > Status: Connected
> > > > Total Number of entries: 0
> > > > Number of entries in heal pending: 0
> > > > Number of entries in split-brain: 0
> > > > Number of entries possibly healing: 0
> > > >
> > > > Brick 192.168.0.41:/var/bricks/3/brick
> > > > Status: Connected
> > > > Total Number of entries: 0
> > > > Number of entries in heal pending: 0
> > > > Number of entries in split-brain: 0
> > > > Number of entries possibly healing: 0
> > > >
> > > > Brick 192.168.0.80:/var/bricks/4/brick
> > > > Status: Connected
> > > > Total Number of entries: 0
> > > > Number of entries in heal pending: 0
> > > > Number of entries in split-brain: 0
> > > > Number of entries possibly healing: 0
> > > >
> > > > Brick 192.168.0.40:/var/bricks/arb_0/brick
> > > > Status: Connected
> > > > Total Number of entries: 0
> > > > Number of entries in heal pending: 0
> > > > Number of entries in split-brain: 0
> > > > Number of entries possibly healing: 0
> > > >
> > > > Brick 192.168.0.41:/var/bricks/2/brick
> > > > Status: Connected
> > > > Total Number of entries: 0
> > > > Number of entries in heal pending: 0
> > > > Number of entries in split-brain: 0
> > > > Number of entries possibly healing: 0
> > > >
> > > > Brick 192.168.0.80:/var/bricks/5/brick
> > > > Status: Connected
> > > > Total Number of entries: 0
> > > > Number of entries in heal pending: 0
> > > > Number of entries in split-brain: 0
> > > > Number of entries possibly healing: 0
> > > >
> > > > Brick 192.168.0.40:/var/bricks/arb_1/brick
> > > > Status: Connected
> > > > Total Number of entries: 0
> > > > Number of entries in heal pending: 0
> > > > Number of entries in split-brain: 0
> > > > Number of entries possibly healing: 0
> > > >
> > > > client-list:
> > > >
> > > > Client connections for volume gv0
> > > > Name count
> > > > ----- ------
> > > > fuse 5
> > > > gfapi.ganesha.nfsd 3
> > > > glustershd 3
> > > >
> > > > total clients for volume gv0 : 11
> > > >
-----------------------------------------------------------------
> > > >
> > > > all clients:
https://pro.hostit.de/nextcloud/index.php/s/tWdHox3aqb3qqbG
> > > >
> > > > failing mnt.log
https://pro.hostit.de/nextcloud/index.php/s/2E2NLnXNsTy7EQe
> > > >
> > > > Thank you.
> > > >
> > > > A.
> > > >
> > > > "Strahil Nikolov" hunter86_bg at yahoo.com ? 31.
Mai 2021 05:23
> > > > > Can you provide gluster volume info , gluster volume
status and gluster volume heal info summary and most probably gluster volume
status all clients/client-list
> > > > >
> > > > >
> > > > > Best Regards,
> > > > > Strahil Nikolov
> > > > >
> > > > > > On Sun, May 30, 2021 at 15:17, a.schwibbe at
gmx.net
> > > > > > wrote:
> > > > > >
> > > > > > I am seeking help here after looking for solutions
on the web for my distributed-replicated volume.
> > > > > >
> > > > > > My volume is operated since v3.10 and I upgraded
through to 7.9, replaced nodes, replaced bricks without a problem. I love it.
> > > > > >
> > > > > > Finally I wanted to extend my 6x2 distributed
replicated volume with arbiters for better split-brain protection.
> > > > > >
> > > > > >
> > > > > > So I add-brick with replication 3 arbiter 1 (as I
had a 6x2 I obviously added 6 arb bricks) and it successfully converted to 6 x
(2 +1) and self-heal immideately started. Looking good.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Version: 7.9
> > > > > >
> > > > > >
> > > > > > Number of Bricks: 6 x (2 + 1) = 18
> > > > > >
> > > > > >
> > > > > > cluster.max-op-version: 70200
> > > > > >
> > > > > >
> > > > > > Peers: 3 (node[0..2])
> > > > > >
> > > > > >
> > > > > > Layout
> > > > > >
> > > > > >
> > > > > > |node0 |node1 |node2
> > > > > >
> > > > > > |brick0 |brick0 |arbit0
> > > > > >
> > > > > >
> > > > > > |arbit1 |brick1 |brick1
> > > > > >
> > > > > >
> > > > > > ....
> > > > > >
> > > > > >
> > > > > >
> > > > > > I then recognized that arbiter volumes on node0
& node1 have been healed successfully.
> > > > > >
> > > > > > Unfortunately all arbiter volumes on node2 have
not been healed!
> > > > > >
> > > > > > I realized that the main dir on my arb mount point
has been added (mount point /var/brick/arb_0 now contains dir "brick")
however this dir on _all_ other bricks has numeric ID 33, but on this on it has
0). The brick dir on the faulty arb-volumes does contain ".glusterfs",
however it has only very few entries. Other than that "brick" is
empty.
> > > > > >
> > > > > > At that point I changed brick dir owner with chown
to 33:33 and hoped for self-heal to work. It did not.
> > > > > >
> > > > > > I hoped a rebalance fix-layout would fix things.
It did not.
> > > > > >
> > > > > > I hoped for a glusterd restart on node2 (as this
is happening to both arb volumes on this node exclusively) would help. It did
not.
> > > > > >
> > > > > >
> > > > > > Active mount points via nfs-ganesha or fuse
continue to work.
> > > > > >
> > > > > > Existing clients cause errors in the arb-brick
logs on node2 for missing files or dirs, but clients seem not affected. r/w
operations work.
> > > > > >
> > > > > >
> > > > > > New clients are not able to fuse mount the volume
for "authentication error".
> > > > > >
> > > > > > heal statistics heal-count show several hundred
files need healing, this count is rising. Watching df on the arb-brick mount
point on node2 shows every now and then a few bytes written, but then removed
immideately after that.
> > > > > >
> > > > > >
> > > > > > Any help/recommendation for you highly
appreciated.
> > > > > >
> > > > > > Thank you!
> > > > > >
> > > > > >
> > > > > > A.
> > > > > >
> > > > > > ________
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Community Meeting Calendar:
> > > > > >
> > > > > >
> > > > > > Schedule -
> > > > > >
> > > > > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > > > > >
> > > > > > Bridge: https://meet.google.com/cpu-eiue-hvk
> > > > > >
> > > > > > Gluster-users mailing list
> > > > > >
> > > > > > Gluster-users at gluster.org
> > > > > >
> > > > > >
https://lists.gluster.org/mailman/listinfo/gluster-users
> > > > > >
> > > > > >
> > > >
> > > >
> >
> >

a.schwibbe at gmx.net

2021-May-31 16:28 UTC

head link

[Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed

I can't find anything suspicious in the brick logs other than authetication
refused to clients trying to mount a dir that is not existing on the arb_n,
because the self-heal isn't working.
I tried to add another node and replace-brick a faulty arbiter, however this new
arbiter sees the same error.

Last idea is to completely remove first subvolume, then re-add as new hoping it
will work.


A.


"a.schwibbe at gmx.net" a.schwibbe at gmx.net ? 31. Mai 2021
13:44> Ok, will do.
>
>
> working arbiter:
>
> ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 13 33 33 146 Mai 29 22:38
brick
>
> ls- lna /var/bricks/arb_0/brick >>> drw------- 262 0 0 8192 Mai 29
22:38 .glusterfs
> + all data-brick dirs ...
>
>
> affected arbiter:
>
> ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 3 0 0 24 Mai 30 16:23
brick
> ls -lna /var/bricks/arb_0/brick >>> drw------- 7 0 0 99 Mai 30
16:23 .glusterfs
> nothing else here
>
>
> find /var/bricks/arb_0/brick -not -user 33 -print
>
> /var/bricks/arb_0/brick/.glusterfs
> /var/bricks/arb_0/brick/.glusterfs/indices
> /var/bricks/arb_0/brick/.glusterfs/indices/xattrop
> /var/bricks/arb_0/brick/.glusterfs/indices/dirty
> /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes
> /var/bricks/arb_0/brick/.glusterfs/changelogs
> /var/bricks/arb_0/brick/.glusterfs/changelogs/htime
> /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap
> /var/bricks/arb_0/brick/.glusterfs/00
> /var/bricks/arb_0/brick/.glusterfs/00/00
>
/var/bricks/arb_0/brick/.glusterfs/00/00/00000000-0000-0000-0000-000000000001
> /var/bricks/arb_0/brick/.glusterfs/landfill
> /var/bricks/arb_0/brick/.glusterfs/unlink
> /var/bricks/arb_0/brick/.glusterfs/health_check
>
> find /var/bricks/arb_0/brick -not -user 33 -print
>
> /var/bricks/arb_0/brick/.glusterfs
> /var/bricks/arb_0/brick/.glusterfs/indices
> /var/bricks/arb_0/brick/.glusterfs/indices/xattrop
> /var/bricks/arb_0/brick/.glusterfs/indices/dirty
> /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes
> /var/bricks/arb_0/brick/.glusterfs/changelogs
> /var/bricks/arb_0/brick/.glusterfs/changelogs/htime
> /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap
> /var/bricks/arb_0/brick/.glusterfs/00
> /var/bricks/arb_0/brick/.glusterfs/00/00
>
/var/bricks/arb_0/brick/.glusterfs/00/00/00000000-0000-0000-0000-000000000001
> /var/bricks/arb_0/brick/.glusterfs/landfill
> /var/bricks/arb_0/brick/.glusterfs/unlink
> /var/bricks/arb_0/brick/.glusterfs/health_check
>
> Output is identical to user:group 36 as all these have UID:GID 0:0, but
these files have 0:0 also on the working arbiters.
> And this is all files/dirs that exist on the affected arbs. Nothing more on
it. There should be much more, but this seems to missing self heal.
>
> Thanks.
>
> A.
>
>
> "Strahil Nikolov" hunter86_bg at yahoo.com ? 31. Mai 2021 13:11
> > Hi,
> >
> > I think that the best way is to go through the logs on the affected
arbiter brick (maybe even temporarily increase the log level).
> >
> > What is the output of:
> >
> > find /var/brick/arb_0/brick -not -user 36 -print
> > find /var/brick/arb_0/brick -not group 36 -print
> >
> > Maybe there are some files/dirs that are with wrong ownership.
> >
> > Best Regards,
> > Strahil Nikolov
> >
> > >
> > > Thanks Strahil,
> > >
> > > unfortunately I cannot connect as the mount is denied as in
mount.log provided.
> > > IPs > n.n.n..100 are clients and simply cannot mount the
volume. When killing the arb pids on node2 new clients can mount the volume.
When bringing them up again I experience the same problem.
> > >
> > > I wonder why the root dir on the arb bricks has wrong UID:GID.
> > > I added regular data bricks before without any problems on node2.
> > >
> > > Also when executing "watch df"
> > >
> > > I see
> > >
> > > /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0
> > > ..
> > >
> > > /dev/md50 11700224 33128 11667096 1% /var/bricks/arb_0
> > >
> > > ..
> > >
> > > /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0
> > >
> > > So heal daemon might try to do something, which isn't
working. Thus I chowned UID:GID of ../arb_0/brick manually to match, but it did
not work either.
> > >
> > > As I added all 6 arbs at once and 4 are working as expected I
really don't get what's wrong with these...
> > >
> > > A.
> > >
> > > "Strahil Nikolov" hunter86_bg at yahoo.com ? 31. Mai
2021 11:12
> > > > For the arb_0 I seeonly 8 clients , while there should be 12
clients:
> > > > Brick : 192.168.0.40:/var/bricks/0/brick
> > > > Clients connected : 12
> > > >
> > > > Brick : 192.168.0.41:/var/bricks/0/brick
> > > > Clients connected : 12
> > > >
> > > > Brick : 192.168.0.80:/var/bricks/arb_0/brick
> > > > Clients connected : 8
> > > >
> > > > Can you try to reconnect them. The most simple way is to
kill the arbiter process and 'gluster volume start force' , but always
verify that you have both data bricks up and running.
> > > >
> > > >
> > > >
> > > > Yet, this doesn't explain why the heal daemon is not
able to replicate properly.
> > > >
> > > >
> > > >
> > > > Best Regards,
> > > > Strahil Nikolov
> > > > >
> > > > > Meanwhile I tried reset-brick on one of the failing
arbiters on node2, but with same results. The behaviour is reproducible, arbiter
stays empty.
> > > > >
> > > > > node0: 192.168.0.40
> > > > >
> > > > > node1: 192.168.0.41
> > > > >
> > > > > node3: 192.168.0.80
> > > > >
> > > > > volume info:
> > > > >
> > > > > Volume Name: gv0
> > > > > Type: Distributed-Replicate
> > > > > Volume ID: 9bafc4d2-d9b6-4b6d-a631-1cf42d1d2559
> > > > > Status: Started
> > > > > Snapshot Count: 0
> > > > > Number of Bricks: 6 x (2 + 1) = 18
> > > > > Transport-type: tcp
> > > > > Bricks:
> > > > > Brick1: 192.168.0.40:/var/bricks/0/brick
> > > > > Brick2: 192.168.0.41:/var/bricks/0/brick
> > > > > Brick3: 192.168.0.80:/var/bricks/arb_0/brick (arbiter)
> > > > > Brick4: 192.168.0.40:/var/bricks/2/brick
> > > > > Brick5: 192.168.0.80:/var/bricks/2/brick
> > > > > Brick6: 192.168.0.41:/var/bricks/arb_1/brick (arbiter)
> > > > > Brick7: 192.168.0.40:/var/bricks/1/brick
> > > > > Brick8: 192.168.0.41:/var/bricks/1/brick
> > > > > Brick9: 192.168.0.80:/var/bricks/arb_1/brick (arbiter)
> > > > > Brick10: 192.168.0.40:/var/bricks/3/brick
> > > > > Brick11: 192.168.0.80:/var/bricks/3/brick
> > > > > Brick12: 192.168.0.41:/var/bricks/arb_0/brick (arbiter)
> > > > > Brick13: 192.168.0.41:/var/bricks/3/brick
> > > > > Brick14: 192.168.0.80:/var/bricks/4/brick
> > > > > Brick15: 192.168.0.40:/var/bricks/arb_0/brick (arbiter)
> > > > > Brick16: 192.168.0.41:/var/bricks/2/brick
> > > > > Brick17: 192.168.0.80:/var/bricks/5/brick
> > > > > Brick18: 192.168.0.40:/var/bricks/arb_1/brick (arbiter)
> > > > > Options Reconfigured:
> > > > > cluster.min-free-inodes: 6%
> > > > > cluster.min-free-disk: 2%
> > > > > performance.md-cache-timeout: 600
> > > > > cluster.rebal-throttle: lazy
> > > > > features.scrub-freq: monthly
> > > > > features.scrub-throttle: lazy
> > > > > features.scrub: Inactive
> > > > > features.bitrot: off
> > > > > cluster.server-quorum-type: none
> > > > > performance.cache-refresh-timeout: 10
> > > > > performance.cache-max-file-size: 64MB
> > > > > performance.cache-size: 781901824
> > > > > auth.allow:
/(192.168.0.*),/usr/andreas(192.168.0.120),/usr/otis(192.168.0.168),/usr/otis(192.168.0.111),/usr/otis(192.168.0.249),/media(192.168.0.*),/virt(192.168.0.*),/cloud(192.168.0.247),/zm(192.168.0.136)
> > > > > performance.cache-invalidation: on
> > > > > performance.stat-prefetch: on
> > > > > features.cache-invalidation-timeout: 600
> > > > > cluster.quorum-type: auto
> > > > > features.cache-invalidation: on
> > > > > nfs.disable: on
> > > > > transport.address-family: inet
> > > > > cluster.self-heal-daemon: on
> > > > > cluster.server-quorum-ratio: 51%
> > > > >
> > > > > volume status:
> > > > >
> > > > > Status of volume: gv0
> > > > > Gluster process TCP Port RDMA Port Online Pid
> > > > >
------------------------------------------------------------------------------
> > > > > Brick 192.168.0.40:/var/bricks/0/brick 49155 0 Y 713066
> > > > > Brick 192.168.0.41:/var/bricks/0/brick 49152 0 Y 2082
> > > > > Brick 192.168.0.80:/var/bricks/arb_0/brick 49152 0 Y
26186
> > > > > Brick 192.168.0.40:/var/bricks/2/brick 49156 0 Y 713075
> > > > > Brick 192.168.0.80:/var/bricks/2/brick 49154 0 Y 325
> > > > > Brick 192.168.0.41:/var/bricks/arb_1/brick 49157 0 Y
1746903
> > > > > Brick 192.168.0.40:/var/bricks/1/brick 49157 0 Y 713084
> > > > > Brick 192.168.0.41:/var/bricks/1/brick 49153 0 Y 14104
> > > > > Brick 192.168.0.80:/var/bricks/arb_1/brick 49159 0 Y
2314
> > > > > Brick 192.168.0.40:/var/bricks/3/brick 49153 0 Y
2978692
> > > > > Brick 192.168.0.80:/var/bricks/3/brick 49155 0 Y 23269
> > > > > Brick 192.168.0.41:/var/bricks/arb_0/brick 49158 0 Y
1746942
> > > > > Brick 192.168.0.41:/var/bricks/3/brick 49155 0 Y 897058
> > > > > Brick 192.168.0.80:/var/bricks/4/brick 49156 0 Y 27433
> > > > > Brick 192.168.0.40:/var/bricks/arb_0/brick 49152 0 Y
3561115
> > > > > Brick 192.168.0.41:/var/bricks/2/brick 49156 0 Y 902602
> > > > > Brick 192.168.0.80:/var/bricks/5/brick 49157 0 Y 29522
> > > > > Brick 192.168.0.40:/var/bricks/arb_1/brick 49154 0 Y
3561159
> > > > > Self-heal Daemon on localhost N/A N/A Y 26199
> > > > > Self-heal Daemon on 192.168.0.41 N/A N/A Y 2240635
> > > > > Self-heal Daemon on 192.168.0.40 N/A N/A Y 3912810
> > > > >
> > > > > Task Status of Volume gv0
> > > > >
------------------------------------------------------------------------------
> > > > > There are no active volume tasks
> > > > >
> > > > > volume heal info summary:
> > > > >
> > > > > Brick 192.168.0.40:/var/bricks/0/brick <--- contains
100177 files in 25015 dirs
> > > > > Status: Connected
> > > > > Total Number of entries: 1006
> > > > > Number of entries in heal pending: 1006
> > > > > Number of entries in split-brain: 0
> > > > > Number of entries possibly healing: 0
> > > > >
> > > > > Brick 192.168.0.41:/var/bricks/0/brick
> > > > > Status: Connected
> > > > > Total Number of entries: 1006
> > > > > Number of entries in heal pending: 1006
> > > > > Number of entries in split-brain: 0
> > > > > Number of entries possibly healing: 0
> > > > >
> > > > > Brick 192.168.0.80:/var/bricks/arb_0/brick
> > > > > Status: Connected
> > > > > Total Number of entries: 0
> > > > > Number of entries in heal pending: 0
> > > > > Number of entries in split-brain: 0
> > > > > Number of entries possibly healing: 0
> > > > >
> > > > > Brick 192.168.0.40:/var/bricks/2/brick
> > > > > Status: Connected
> > > > > Total Number of entries: 0
> > > > > Number of entries in heal pending: 0
> > > > > Number of entries in split-brain: 0
> > > > > Number of entries possibly healing: 0
> > > > >
> > > > > Brick 192.168.0.80:/var/bricks/2/brick
> > > > > Status: Connected
> > > > > Total Number of entries: 0
> > > > > Number of entries in heal pending: 0
> > > > > Number of entries in split-brain: 0
> > > > > Number of entries possibly healing: 0
> > > > >
> > > > > Brick 192.168.0.41:/var/bricks/arb_1/brick
> > > > > Status: Connected
> > > > > Total Number of entries: 0
> > > > > Number of entries in heal pending: 0
> > > > > Number of entries in split-brain: 0
> > > > > Number of entries possibly healing: 0
> > > > >
> > > > > Brick 192.168.0.40:/var/bricks/1/brick
> > > > > Status: Connected
> > > > > Total Number of entries: 1006
> > > > > Number of entries in heal pending: 1006
> > > > > Number of entries in split-brain: 0
> > > > > Number of entries possibly healing: 0
> > > > >
> > > > > Brick 192.168.0.41:/var/bricks/1/brick
> > > > > Status: Connected
> > > > > Total Number of entries: 1006
> > > > > Number of entries in heal pending: 1006
> > > > > Number of entries in split-brain: 0
> > > > > Number of entries possibly healing: 0
> > > > >
> > > > > Brick 192.168.0.80:/var/bricks/arb_1/brick
> > > > > Status: Connected
> > > > > Total Number of entries: 0
> > > > > Number of entries in heal pending: 0
> > > > > Number of entries in split-brain: 0
> > > > > Number of entries possibly healing: 0
> > > > >
> > > > > Brick 192.168.0.40:/var/bricks/3/brick
> > > > > Status: Connected
> > > > > Total Number of entries: 0
> > > > > Number of entries in heal pending: 0
> > > > > Number of entries in split-brain: 0
> > > > > Number of entries possibly healing: 0
> > > > >
> > > > > Brick 192.168.0.80:/var/bricks/3/brick
> > > > > Status: Connected
> > > > > Total Number of entries: 0
> > > > > Number of entries in heal pending: 0
> > > > > Number of entries in split-brain: 0
> > > > > Number of entries possibly healing: 0
> > > > >
> > > > > Brick 192.168.0.41:/var/bricks/arb_0/brick
> > > > > Status: Connected
> > > > > Total Number of entries: 0
> > > > > Number of entries in heal pending: 0
> > > > > Number of entries in split-brain: 0
> > > > > Number of entries possibly healing: 0
> > > > >
> > > > > Brick 192.168.0.41:/var/bricks/3/brick
> > > > > Status: Connected
> > > > > Total Number of entries: 0
> > > > > Number of entries in heal pending: 0
> > > > > Number of entries in split-brain: 0
> > > > > Number of entries possibly healing: 0
> > > > >
> > > > > Brick 192.168.0.80:/var/bricks/4/brick
> > > > > Status: Connected
> > > > > Total Number of entries: 0
> > > > > Number of entries in heal pending: 0
> > > > > Number of entries in split-brain: 0
> > > > > Number of entries possibly healing: 0
> > > > >
> > > > > Brick 192.168.0.40:/var/bricks/arb_0/brick
> > > > > Status: Connected
> > > > > Total Number of entries: 0
> > > > > Number of entries in heal pending: 0
> > > > > Number of entries in split-brain: 0
> > > > > Number of entries possibly healing: 0
> > > > >
> > > > > Brick 192.168.0.41:/var/bricks/2/brick
> > > > > Status: Connected
> > > > > Total Number of entries: 0
> > > > > Number of entries in heal pending: 0
> > > > > Number of entries in split-brain: 0
> > > > > Number of entries possibly healing: 0
> > > > >
> > > > > Brick 192.168.0.80:/var/bricks/5/brick
> > > > > Status: Connected
> > > > > Total Number of entries: 0
> > > > > Number of entries in heal pending: 0
> > > > > Number of entries in split-brain: 0
> > > > > Number of entries possibly healing: 0
> > > > >
> > > > > Brick 192.168.0.40:/var/bricks/arb_1/brick
> > > > > Status: Connected
> > > > > Total Number of entries: 0
> > > > > Number of entries in heal pending: 0
> > > > > Number of entries in split-brain: 0
> > > > > Number of entries possibly healing: 0
> > > > >
> > > > > client-list:
> > > > >
> > > > > Client connections for volume gv0
> > > > > Name count
> > > > > ----- ------
> > > > > fuse 5
> > > > > gfapi.ganesha.nfsd 3
> > > > > glustershd 3
> > > > >
> > > > > total clients for volume gv0 : 11
> > > > >
-----------------------------------------------------------------
> > > > >
> > > > > all clients:
pro.hostit.de/nextcloud/index.php/s/tWdHox3aqb3qqbG
> > > > >
> > > > > failing mnt.log
pro.hostit.de/nextcloud/index.php/s/2E2NLnXNsTy7EQe
> > > > >
> > > > > Thank you.
> > > > >
> > > > > A.
> > > > >
> > > > > "Strahil Nikolov" hunter86_bg at yahoo.com ?
31. Mai 2021 05:23
> > > > > > Can you provide gluster volume info , gluster
volume status and gluster volume heal info summary and most probably gluster
volume status all clients/client-list
> > > > > >
> > > > > >
> > > > > > Best Regards,
> > > > > > Strahil Nikolov
> > > > > >
> > > > > > > On Sun, May 30, 2021 at 15:17, a.schwibbe at
gmx.net
> > > > > > > wrote:
> > > > > > >
> > > > > > > I am seeking help here after looking for
solutions on the web for my distributed-replicated volume.
> > > > > > >
> > > > > > > My volume is operated since v3.10 and I
upgraded through to 7.9, replaced nodes, replaced bricks without a problem. I
love it.
> > > > > > >
> > > > > > > Finally I wanted to extend my 6x2 distributed
replicated volume with arbiters for better split-brain protection.
> > > > > > >
> > > > > > >
> > > > > > > So I add-brick with replication 3 arbiter 1
(as I had a 6x2 I obviously added 6 arb bricks) and it successfully converted to
6 x (2 +1) and self-heal immideately started. Looking good.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Version: 7.9
> > > > > > >
> > > > > > >
> > > > > > > Number of Bricks: 6 x (2 + 1) = 18
> > > > > > >
> > > > > > >
> > > > > > > cluster.max-op-version: 70200
> > > > > > >
> > > > > > >
> > > > > > > Peers: 3 (node[0..2])
> > > > > > >
> > > > > > >
> > > > > > > Layout
> > > > > > >
> > > > > > >
> > > > > > > |node0 |node1 |node2
> > > > > > >
> > > > > > > |brick0 |brick0 |arbit0
> > > > > > >
> > > > > > >
> > > > > > > |arbit1 |brick1 |brick1
> > > > > > >
> > > > > > >
> > > > > > > ....
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > I then recognized that arbiter volumes on
node0 & node1 have been healed successfully.
> > > > > > >
> > > > > > > Unfortunately all arbiter volumes on node2
have not been healed!
> > > > > > >
> > > > > > > I realized that the main dir on my arb mount
point has been added (mount point /var/brick/arb_0 now contains dir
"brick") however this dir on _all_ other bricks has numeric ID 33, but
on this on it has 0). The brick dir on the faulty arb-volumes does contain
".glusterfs", however it has only very few entries. Other than that
"brick" is empty.
> > > > > > >
> > > > > > > At that point I changed brick dir owner with
chown to 33:33 and hoped for self-heal to work. It did not.
> > > > > > >
> > > > > > > I hoped a rebalance fix-layout would fix
things. It did not.
> > > > > > >
> > > > > > > I hoped for a glusterd restart on node2 (as
this is happening to both arb volumes on this node exclusively) would help. It
did not.
> > > > > > >
> > > > > > >
> > > > > > > Active mount points via nfs-ganesha or fuse
continue to work.
> > > > > > >
> > > > > > > Existing clients cause errors in the
arb-brick logs on node2 for missing files or dirs, but clients seem not
affected. r/w operations work.
> > > > > > >
> > > > > > >
> > > > > > > New clients are not able to fuse mount the
volume for "authentication error".
> > > > > > >
> > > > > > > heal statistics heal-count show several
hundred files need healing, this count is rising. Watching df on the arb-brick
mount point on node2 shows every now and then a few bytes written, but then
removed immideately after that.
> > > > > > >
> > > > > > >
> > > > > > > Any help/recommendation for you highly
appreciated.
> > > > > > >
> > > > > > > Thank you!
> > > > > > >
> > > > > > >
> > > > > > > A.
> > > > > > >
> > > > > > > ________
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Community Meeting Calendar:
> > > > > > >
> > > > > > >
> > > > > > > Schedule -
> > > > > > >
> > > > > > > Every 2nd and 4th Tuesday at 14:30 IST /
09:00 UTC
> > > > > > >
> > > > > > > Bridge: meet.google.com/cpu-eiue-hvk
> > > > > > >
> > > > > > > Gluster-users mailing list
> > > > > > >
> > > > > > > Gluster-users at gluster.org
> > > > > > >
> > > > > > >
lists.gluster.org/mailman/listinfo/gluster-users
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > >
> > >
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> lists.gluster.org/mailman/listinfo/gluster-users
>
>

Gluster users - May 2021 - Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed

[Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed

[Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed