Ravishankar N
2020-Jan-30 06:21 UTC
[Gluster-users] interpreting heal info and reported entries
On 30/01/20 11:41 am, Ravishankar N wrote:> I think for some reason setting of AFR xattrs on the parent dir did > not happen, which is why the files are stuck in split-brain (instead > of getting recreated on repo2 using the files from repo0 or 1).Can you provide the getfattr output of the parent dir of one of the .meta files? Maybe '/372501f5-062c-4790-afdb-dd7e761828ac/images/968daf61-6858-454a-9ed4-3d3db2ae1805'. Please share it from all three bricks. Also, after you brought repo2 back up, can you check the glustershd.log of repo0 and repo1 to see if they contain entry-selfheal messages for the directory, something like: "[2020-01-27 08:53:47.456205] I [MSGID: 108026] [afr-self-heal-common.c:1750:afr_log_selfheal] 0-engine-replicate-0: Completed entry selfheal on <$gfid-of-directory>. sources=[0] 1? sinks=2" The timestamps should be after repo2 came back online after the upgrade. -Ravi
Strahil Nikolov
2020-Jan-30 15:58 UTC
[Gluster-users] interpreting heal info and reported entries
On January 30, 2020 8:21:18 AM GMT+02:00, Ravishankar N <ravishankar at redhat.com> wrote:> >On 30/01/20 11:41 am, Ravishankar N wrote: >> I think for some reason setting of AFR xattrs on the parent dir did >> not happen, which is why the files are stuck in split-brain (instead >> of getting recreated on repo2 using the files from repo0 or 1). > >Can you provide the getfattr output of the parent dir of one of the >.meta files? Maybe >'/372501f5-062c-4790-afdb-dd7e761828ac/images/968daf61-6858-454a-9ed4-3d3db2ae1805'. > >Please share it from all three bricks. > >Also, after you brought repo2 back up, can you check the glustershd.log > >of repo0 and repo1 to see if they contain entry-selfheal messages for >the directory, something like: > >"[2020-01-27 08:53:47.456205] I [MSGID: 108026] >[afr-self-heal-common.c:1750:afr_log_selfheal] 0-engine-replicate-0: >Completed entry selfheal on <$gfid-of-directory>. sources=[0] 1? >sinks=2" > >The timestamps should be after repo2 came back online after the >upgrade. > >-Ravi > >________ > >Community Meeting Calendar: > >APAC Schedule - >Every 2nd and 4th Tuesday at 11:30 AM IST >Bridge: https://bluejeans.com/441850968 > >NA/EMEA Schedule - >Every 1st and 3rd Tuesday at 01:00 PM EDT >Bridge: https://bluejeans.com/441850968 > >Gluster-users mailing list >Gluster-users at gluster.org >https://lists.gluster.org/mailman/listinfo/gluster-usersHi Ravi, This is the third time an oVirt user (one is me and I think my email is in the list) that report such issue. We need a through investigation as this is reoccurring. Best Regards, Strahil Nikolov
Ravishankar N
2020-Feb-11 12:16 UTC
[Gluster-users] interpreting heal info and reported entries
On 30/01/20 9:28 pm, Strahil Nikolov wrote:> On January 30, 2020 8:21:18 AM GMT+02:00, Ravishankar N <ravishankar at redhat.com> wrote: >> On 30/01/20 11:41 am, Ravishankar N wrote: >>> I think for some reason setting of AFR xattrs on the parent dir did >>> not happen, which is why the files are stuck in split-brain (instead >>> of getting recreated on repo2 using the files from repo0 or 1). >> Can you provide the getfattr output of the parent dir of one of the >> .meta files? Maybe >> '/372501f5-062c-4790-afdb-dd7e761828ac/images/968daf61-6858-454a-9ed4-3d3db2ae1805'. >> >> Please share it from all three bricks. >> >> Also, after you brought repo2 back up, can you check the glustershd.log >> >> of repo0 and repo1 to see if they contain entry-selfheal messages for >> the directory, something like: >> >> "[2020-01-27 08:53:47.456205] I [MSGID: 108026] >> [afr-self-heal-common.c:1750:afr_log_selfheal] 0-engine-replicate-0: >> Completed entry selfheal on <$gfid-of-directory>. sources=[0] 1 >> sinks=2" >> >> The timestamps should be after repo2 came back online after the >> upgrade. >> >> -Ravi >> >> ________ >> >> Community Meeting Calendar: >> >> APAC Schedule - >> Every 2nd and 4th Tuesday at 11:30 AM IST >> Bridge: https://bluejeans.com/441850968 >> >> NA/EMEA Schedule - >> Every 1st and 3rd Tuesday at 01:00 PM EDT >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > Hi Ravi, > > This is the third time an oVirt user (one is me and I think my email is in the list) that report such issue. > We need a through investigation as this is reoccurring.Thanks to in-house testing by Milind and Satheesaran, we now know why the parent directory did not have pending xattrs and why the files were not getting healed. You can look at https://bugzilla.redhat.com/show_bug.cgi?id=1801624 for more info. Once the fix (https://review.gluster.org/#/c/glusterfs/+/24109/) is reviewed and merged, I will back port it to the release branches.> > Best Regards, > Strahil Nikolov >