thr3ads.net - Gluster users - [Gluster-users] No healing, errno 22 [Mar 2021]

If this information is useful, please help other people find it:
Share via:

Ravishankar N

2021-Mar-15 15:40 UTC

[Gluster-users] No healing, errno 22

On 15/03/21 7:39 pm, Zenon Panoussis wrote:> I don't know how to interpret this, but it surely looks as if
> Maildir/.Sent/cur needs to be healed on all three bricks. That
> can't be possible, logically it doesn't make sense, because if
> not even one brick has the data of an object, that object should
> not exist at all.For the same directory, different bricks could contain different files 
which are the good copies that need to be synced to the other replicas, 
so the same dir being listed in the heal info output of all bricks is 
not a problem.>> Are there any file names inside
>>
/gfs/gv0/.glusterfs/indices/entry-changes/011fcc1b-4d90-4c36-86ec-488aaa4db3b8
>> in any of the bricks?
> node01: empty.
> node02: 388 filenames, no directories.
> node03: 394 filenames, no directories.
>
> Would simply re-copying the entire Maildir/.Sent/cur and its contents
> to the volume solve the problem or make it worse?Yes if the dataset is small, you can try rm -rf of the dir from the 
mount (assuming no other application is accessing them on the volume) 
launch heal once so that the heal info becomes zero and then copy it 
over again .
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20210315/19082813/attachment.html>

Zenon Panoussis

2021-Mar-16 18:15 UTC

head link

[Gluster-users] No healing, errno 22

> Yes if the dataset is small, you can try rm -rf of the dir 
> from the mount (assuming no other application is accessing 
> them on the volume) launch heal once so that the heal info 
> becomes zero and then copy it over again .
I did approximately so; the rm -rf took its sweet time and the
number of entries to be healed kept diminishing as the deletion
progressed. At the end I was left with

Mon Mar 15 22:57:09 CET 2021
Gathering count of entries to be healed on volume gv0 has been successful

Brick node01:/gfs/gv0
Number of entries: 3

Brick mikrivouli:/gfs/gv0
Number of entries: 2

Brick nanosaurus:/gfs/gv0
Number of entries: 3
--------------

and that's where I've been ever since, for the past 20 hours.
SHD has kept trying to heal them all along and the log brings
us back to square one:

[2021-03-16 14:51:35.059593 +0000] I [MSGID: 108026]
[afr-self-heal-entry.c:1053:afr_selfheal_entry_do] 0-gv0-replicate-0: performing
entry selfheal on 94aefa13-9828-49e5-9bac-6f70453c100f
[2021-03-16 15:39:43.680380 +0000] E [MSGID: 114031]
[client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-gv0-client-0: remote operation
failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-03-16 15:39:43.769604 +0000] E [MSGID: 114031]
[client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-gv0-client-2: remote operation
failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-03-16 15:39:43.908425 +0000] E [MSGID: 114031]
[client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-gv0-client-1: remote operation
failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[...]

In other words, deleting and recreating the unhealable files
and directories was a workaround, but the underlying problem
persists and I can't even begin to look for it when I have no
clue what errno 22 means in plain English.

In any case, glusterd.log is full of messages like

[2021-03-16 15:37:03.398619 +0000] I [MSGID: 106533]
[glusterd-volume-ops.c:717:__glusterd_handle_cli_heal_volume] 0-management:
Received heal vol req for volume gv0
[2021-03-16 15:37:03.791452 +0000] E [MSGID: 106061]
[glusterd-server-quorum.c:260:glusterd_is_volume_in_server_quorum] 0-management:
Dict get failed [{Key=cluster.server-quorum-type}]

Every single "received heal vol req" message is immediately followed
by a "dict get failed", always for server-quorum-type, for hours on
end. And I begin to smell a bug. The CLI can query the value OK:

# gluster volume get gv0 cluster.server-quorum-type
Option                                  Value
------                                  -----
cluster.server-quorum-type              off


Checking all quorum-related settings, I get

# gluster volume get gv0 all |grep quorum
cluster.quorum-type                     auto
cluster.quorum-count                    (null) (DEFAULT)
cluster.server-quorum-type              off
cluster.server-quorum-ratio             51
cluster.quorum-reads                    no (DEFAULT)
disperse.quorum-count                   0 (DEFAULT)

I never touched any of them and none of them appear in volume info
under "Options Reconfigured", so don't know why three of them are
not marked as defaults.

Next, I tried setting server-quorum-type=server. The server-quorum-type
problem went away and I got a new kind of dict get failure:

The message "E [MSGID: 106061]
[glusterd-volgen.c:2564:brick_graph_add_pump] 0-management: Dict get failed
[{Key=enable-pump}]" repeated 2 times between [2021-03-16 17:12:18.677594
+0000] and [2021-03-16 17:12:18.779859 +0000]

I tried rolling back server-quorum-type=server and got this error:

# gluster volume set gv0 cluster.server-quorum-type off
volume set: failed: option server-quorum-type off: 'off' is not valid
(possible options are none, server.)

Aha, but previously and by default it was clearly "off", not
"none".
That's bug somewhere and that is what was causing the dict get failures
on server-quorum-type. The missing dict enable-pump that's required
by server-quorum-type=server looks also like a bug because there is
no such setting:

# gluster volume get gv0 all |grep pump
#

There are more similarly strange complaints in the glusterd log:

[2021-03-16 17:25:43.134207 +0000] E [MSGID: 106434]
[glusterd-utils.c:13379:glusterd_get_value_for_vme_entry] 0-management:
xlator_volopt_dynload error (-1)
[2021-03-16 17:25:43.141816 +0000] W [MSGID: 106332]
[glusterd-utils.c:13390:glusterd_get_value_for_vme_entry] 0-management: Failed
to get option for localtime-logging key
[2021-03-16 17:25:43.143185 +0000] W [MSGID: 106332]
[glusterd-utils.c:13390:glusterd_get_value_for_vme_entry] 0-management: Failed
to get option for s3plugin-seckey key
[2021-03-16 17:25:43.143340 +0000] W [MSGID: 106332]
[glusterd-utils.c:13390:glusterd_get_value_for_vme_entry] 0-management: Failed
to get option for s3plugin-keyid key
[2021-03-16 17:25:43.143484 +0000] W [MSGID: 106332]
[glusterd-utils.c:13390:glusterd_get_value_for_vme_entry] 0-management: Failed
to get option for s3plugin-bucketid key
[2021-03-16 17:25:43.143621 +0000] W [MSGID: 106332]
[glusterd-utils.c:13390:glusterd_get_value_for_vme_entry] 0-management: Failed
to get option for s3plugin-hostname key

If none of this stuff is used in the first place, it should not
be triggering errors and warnings. If the S3 plugin is not enabled,
the S3 keys should not even be checked. Both the checking of the
keys and the error logging are bugs.

Cool, I'm discovering more and more stuff that needs fixing, but
I'm making zero progress with my healing problem. I'm still stuck
with errno=22.

Gluster users - Mar 2021 - No healing, errno 22

[Gluster-users] No healing, errno 22

[Gluster-users] No healing, errno 22