thr3ads.net - Gluster users - [Gluster-users] No healing, errno 22 [Mar 2021]

If this information is useful, please help other people find it:
Share via:

Zenon Panoussis

2021-Mar-15 14:09 UTC

[Gluster-users] No healing, errno 22

> Hmm, then the client4_0_mkdir_cbk? failures in the glustershd.log 
> must be for a parallel heal of a directory which contains subdirs.
Running volume heal info gives the following results:

node01:
3 gfids and one named directory, namely Maildir/.Sent/cur.
Running gfid2dirname.sh on the 3 gfids returns one error and
two unrelated directories.

node02:
2 gfids, one named directory, namely Maildir/.Sent/cur, and
a whole lot of files in Maildir/.Sent/cur. Running gfid2dirname.sh
on the 2 gfids returns the same two unrelated directories as on
node01.

node03:
A whole list of gfids, no named files or directories. Running
gfid2dirname.sh on the gfids returns a long list of errors,
plus Maildir/.Sent/cur and the same two unrelated directories.

I don't know how to interpret this, but it surely looks as if
Maildir/.Sent/cur needs to be healed on all three bricks. That
can't be possible, logically it doesn't make sense, because if
not even one brick has the data of an object, that object should
not exist at all.
> Are there any file names inside 
>
/gfs/gv0/.glusterfs/indices/entry-changes/011fcc1b-4d90-4c36-86ec-488aaa4db3b8
> in any of the bricks?
node01: empty.
node02: 388 filenames, no directories.
node03: 394 filenames, no directories.

Would simply re-copying the entire Maildir/.Sent/cur and its contents
to the volume solve the problem or make it worse?

Ravishankar N

2021-Mar-15 15:40 UTC

head link

[Gluster-users] No healing, errno 22

On 15/03/21 7:39 pm, Zenon Panoussis wrote:> I don't know how to interpret this, but it surely looks as if
> Maildir/.Sent/cur needs to be healed on all three bricks. That
> can't be possible, logically it doesn't make sense, because if
> not even one brick has the data of an object, that object should
> not exist at all.For the same directory, different bricks could contain different files 
which are the good copies that need to be synced to the other replicas, 
so the same dir being listed in the heal info output of all bricks is 
not a problem.>> Are there any file names inside
>>
/gfs/gv0/.glusterfs/indices/entry-changes/011fcc1b-4d90-4c36-86ec-488aaa4db3b8
>> in any of the bricks?
> node01: empty.
> node02: 388 filenames, no directories.
> node03: 394 filenames, no directories.
>
> Would simply re-copying the entire Maildir/.Sent/cur and its contents
> to the volume solve the problem or make it worse?Yes if the dataset is small, you can try rm -rf of the dir from the 
mount (assuming no other application is accessing them on the volume) 
launch heal once so that the heal info becomes zero and then copy it 
over again .
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20210315/19082813/attachment.html>

Zenon Panoussis

2022-Mar-04 17:32 UTC

head link

[Gluster-users] No healing, errno 22

I am continuing on a thread from March last year; please
see the background in those previous postings.

I am having the same problem again, but now I found the
cause and the way to fix it. It looks to me like a bug,
though I can't be sure.

I have a live mail spool on a replica 3 volume. It has
a standard IMAP directory structure in the form
/volume_mountpoint/username/Maildir/{new,cur,tmp} .
It is important to know that the {new,cur,tmp} directories
are automatically created by the mail server if they
do not already exist. New unseen mail is placed in new
and is automatically moved to cur when an IMAP client
sees it.

I got again some entries that simply won't heal. Same
"errno 22" in the logs as last time. All these unhealable
entries are directories. I compared the directories and
their contents with ls -ld /path/to/dir and ls -l /path/to/dir
on the mounts of all three bricks. All the directories
and their contents were identical.

So I tried moving one of the unhealable directories
off the gluster replica, thinking I would then move
it back on again. What happened surprised me greatly
(shortened output without uid:gid here):

# ls -l /mnt/vmail/username/Maildir/new
total 274
-rw------- 1 10050 Mar 4 08:45
1646383528.M178705P59709V000000000000002EI82CC032CE98F87ED_1.node03.nettheatre.org,S=10050
-rw------- 1 183700 Mar 4 09:26
1646385969.M991789P60955V000000000000002EI9EB5A06448629596_1.node03.nettheatre.org,S=183700
-rw------- 1 6062 Mar 4 09:52
1646387533.M495363P61757V000000000000002EIB73C97A7F4E5E243_1.node03.nettheatre.org,S=6062
-rw------- 1 15646 Mar 4 10:20
1646389259.M17459P62633V000000000000002EI97FFAE35F02DDCC8_1.node03.nettheatre.org,S=15646
-rw------- 1 9254 Mar 4 10:56
1646391406.M98944P63701V000000000000002EIBE0BA94C5363CF98_1.node03.nettheatre.org,S=9254
-rw------- 1 31719 Mar 4 11:07
1646392073.M104124P64011V000000000000002EI8BEB5A4B698B5F97_1.node03.nettheatre.org,S=31719
-rw------- 1 5782 Mar 4 12:12
1646395962.M316395P65769V000000000000002EIA75B42807A9649D5_1.node03.nettheatre.org,S=5782
-rw------- 1 16061 Mar 4 12:22
1646396577.M41309P66103V000000000000002EIA108E5579AA913E1_1.node03.nettheatre.org,S=16061

# mv /mnt/vmail/username/Maildir/new /root/

# ls -l /mnt/vmail/username/Maildir/new
total 72
-rw------- 1 1071 Oct 11 11:23
1633951401.M287288P1545V000000000000FD00I000000000164106D_1.node01.nettheatre.org,S=1071
-rw------- 1 3569 Oct 11 11:24
1633951466.M405994P1571V000000000000FD00I000000000164106E_1.node01.nettheatre.org,S=3569
-rw------- 1 2521 Oct 11 11:51
1633953065.M213650P2762V000000000000FD00I000000000164108A_1.node01.nettheatre.org,S=2521
-rw------- 1 8674 Oct 11 12:16
1633954562.M295498P4083V000000000000FD00I0000000001641099_1.node01.nettheatre.org,S=8674
-rw------- 1 8629 Oct 11 12:16
1633954562.M939396P4087V000000000000FD00I000000000164109C_1.node01.nettheatre.org,S=8629
-rw------- 1 9362 Oct 11 12:39
1633955941.M968102P5102V000000000000FD00I000000000164109D_1.node01.nettheatre.org,S=9362
-rw------- 1 12023 Oct 11 13:41
1633959672.M502160P8408V000000000000FD00I000000000164109E_1.node01.nettheatre.org,S=12023
-rw------- 1 12020 Oct 11 14:06
1633961218.M38654P9430V000000000000FD00I00000000016410A1_1.node01.nettheatre.org,S=12020

The above is the exact sequence of commands, nothing
skipped. I moved the "new" directory off the brick and
underneath it there was another directory with the
exact same name and completely different content. This
clearly explains why the directory could not heal.

This is the same phenomenon that you get if you
mount a partition on a mountpoint that already
contains files: the contents of the newly mounted
partition mask the physical contents of the mountpoint.

How could this happen? I can only guess that the
directory "new" at some point became unavailable
on the brick that the mail server was working on.
A mail arrived for the user, so the mail server
created a "new" directory again, to which gluster
gave a new gfid. End result: two gfids with the
exact same /path/name. Of course it can't heal.

And quite right, as soon as I moved Maildir/new
off the volume, that entry healed. All I had to
do then was

# mv /root/new/* /mnt/vmail/username/Maildir/new/

to put the newer mail back where the mailserver
will (hopefully) see it.

These seem relevant:
https://access.redhat.com/errata/RHBA-2021:1462
https://bugzilla.redhat.com/show_bug.cgi?id=1640148

I have
cluster.use-anonymous-inode yes

The volume in question here was created and
populated on glusterfs-server 9.x; not sure about
the minor version back then. From September 2021
until now I've been running 9.3. The double-entry
error above occurred on or shortly after October 11,
so it's certainly on 9.3.

While I suspect that this is a bug, I won't open
an issue on github (a) because I'm not completely
sure it is a bug and (b) because there's a bot
running around there closing confirmed bugs that
haven't been worked on for a while.

Please ask if there's anything you'd like me to
test or report.

Gluster users - Mar 2021 - No healing, errno 22

[Gluster-users] No healing, errno 22

[Gluster-users] No healing, errno 22

[Gluster-users] No healing, errno 22