thr3ads.net - Gluster users - [Gluster-users] interpreting heal info and reported entries [Feb 2020]

If this information is useful, please help other people find it:
Share via:

Ravishankar N

2020-Jan-30 06:21 UTC

[Gluster-users] interpreting heal info and reported entries

On 30/01/20 11:41 am, Ravishankar N wrote:> I think for some reason setting of AFR xattrs on the parent dir did 
> not happen, which is why the files are stuck in split-brain (instead 
> of getting recreated on repo2 using the files from repo0 or 1). 
Can you provide the getfattr output of the parent dir of one of the 
.meta files? Maybe 
'/372501f5-062c-4790-afdb-dd7e761828ac/images/968daf61-6858-454a-9ed4-3d3db2ae1805'.
Please share it from all three bricks.

Also, after you brought repo2 back up, can you check the glustershd.log 
of repo0 and repo1 to see if they contain entry-selfheal messages for 
the directory, something like:

"[2020-01-27 08:53:47.456205] I [MSGID: 108026] 
[afr-self-heal-common.c:1750:afr_log_selfheal] 0-engine-replicate-0: 
Completed entry selfheal on <$gfid-of-directory>. sources=[0] 1?
sinks=2"

The timestamps should be after repo2 came back online after the upgrade.

-Ravi

Strahil Nikolov

2020-Jan-30 15:58 UTC

head link

[Gluster-users] interpreting heal info and reported entries

On January 30, 2020 8:21:18 AM GMT+02:00, Ravishankar N <ravishankar at
redhat.com> wrote:>
>On 30/01/20 11:41 am, Ravishankar N wrote:
>> I think for some reason setting of AFR xattrs on the parent dir did 
>> not happen, which is why the files are stuck in split-brain (instead 
>> of getting recreated on repo2 using the files from repo0 or 1). 
>
>Can you provide the getfattr output of the parent dir of one of the 
>.meta files? Maybe 
>'/372501f5-062c-4790-afdb-dd7e761828ac/images/968daf61-6858-454a-9ed4-3d3db2ae1805'.
>
>Please share it from all three bricks.
>
>Also, after you brought repo2 back up, can you check the glustershd.log
>
>of repo0 and repo1 to see if they contain entry-selfheal messages for 
>the directory, something like:
>
>"[2020-01-27 08:53:47.456205] I [MSGID: 108026] 
>[afr-self-heal-common.c:1750:afr_log_selfheal] 0-engine-replicate-0: 
>Completed entry selfheal on <$gfid-of-directory>. sources=[0] 1?
>sinks=2"
>
>The timestamps should be after repo2 came back online after the
>upgrade.
>
>-Ravi
>
>________
>
>Community Meeting Calendar:
>
>APAC Schedule -
>Every 2nd and 4th Tuesday at 11:30 AM IST
>Bridge: https://bluejeans.com/441850968
>
>NA/EMEA Schedule -
>Every 1st and 3rd Tuesday at 01:00 PM EDT
>Bridge: https://bluejeans.com/441850968
>
>Gluster-users mailing list
>Gluster-users at gluster.org
>https://lists.gluster.org/mailman/listinfo/gluster-users
Hi Ravi,

This is the third time an oVirt user (one is me and I think my email is in the
list) that report such issue.
We need a through investigation as this is reoccurring.

Best Regards,
Strahil Nikolov

Ravishankar N

2020-Feb-11 12:16 UTC

head link

[Gluster-users] interpreting heal info and reported entries

On 30/01/20 9:28 pm, Strahil Nikolov wrote:> On January 30, 2020 8:21:18 AM GMT+02:00, Ravishankar N <ravishankar at
redhat.com> wrote:
>> On 30/01/20 11:41 am, Ravishankar N wrote:
>>> I think for some reason setting of AFR xattrs on the parent dir did
>>> not happen, which is why the files are stuck in split-brain
(instead
>>> of getting recreated on repo2 using the files from repo0 or 1).
>> Can you provide the getfattr output of the parent dir of one of the
>> .meta files? Maybe
>>
'/372501f5-062c-4790-afdb-dd7e761828ac/images/968daf61-6858-454a-9ed4-3d3db2ae1805'.
>>
>> Please share it from all three bricks.
>>
>> Also, after you brought repo2 back up, can you check the glustershd.log
>>
>> of repo0 and repo1 to see if they contain entry-selfheal messages for
>> the directory, something like:
>>
>> "[2020-01-27 08:53:47.456205] I [MSGID: 108026]
>> [afr-self-heal-common.c:1750:afr_log_selfheal] 0-engine-replicate-0:
>> Completed entry selfheal on <$gfid-of-directory>. sources=[0] 1
>> sinks=2"
>>
>> The timestamps should be after repo2 came back online after the
>> upgrade.
>>
>> -Ravi
>>
>> ________
>>
>> Community Meeting Calendar:
>>
>> APAC Schedule -
>> Every 2nd and 4th Tuesday at 11:30 AM IST
>> Bridge: https://bluejeans.com/441850968
>>
>> NA/EMEA Schedule -
>> Every 1st and 3rd Tuesday at 01:00 PM EDT
>> Bridge: https://bluejeans.com/441850968
>>
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
> Hi Ravi,
>
> This is the third time an oVirt user (one is me and I think my email is in
the list) that report such issue.
> We need a through investigation as this is reoccurring.
Thanks to in-house testing by Milind and Satheesaran, we now know why 
the parent directory did not have pending xattrs and why the files were 
not getting healed. You can look at 
https://bugzilla.redhat.com/show_bug.cgi?id=1801624 for more info. Once 
the fix (https://review.gluster.org/#/c/glusterfs/+/24109/) is reviewed 
and merged, I will back port it to the release branches.
>
> Best Regards,
> Strahil Nikolov
>

Gluster users - Feb 2020 - interpreting heal info and reported entries

[Gluster-users] interpreting heal info and reported entries

[Gluster-users] interpreting heal info and reported entries

[Gluster-users] interpreting heal info and reported entries