thr3ads.net - Gluster users - [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load)

If this information is useful, please help other people find it:
Share via:

Erik Jacobson

2020-Apr-09 20:36 UTC

[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

Once again thanks for sticking with us. Here is a reply from Scott
Titus. If you have something for us to try, we'd love it. The code had
your patch applied when gdb was run:


Here is the addr2line output for those addresses.  Very interesting command, of
which I was not aware.

[root at leader3 ~]# addr2line -f -e /usr/lib64/glusterfs/7.2/xlator/cluster/
afr.so 0x6f735
afr_lookup_metadata_heal_check
afr-common.c:2803
[root at leader3 ~]# addr2line -f -e /usr/lib64/glusterfs/7.2/xlator/cluster/
afr.so 0x6f0b9
afr_lookup_done
afr-common.c:2455
[root at leader3 ~]# addr2line -f -e /usr/lib64/glusterfs/7.2/xlator/cluster/
afr.so 0x5c701
afr_inode_event_gen_reset
afr-common.c:755

Thanks
-Scott


On Thu, Apr 09, 2020 at 11:38:04AM +0530, Ravishankar N
wrote:> 
> On 08/04/20 9:55 pm, Erik Jacobson wrote:
> > 9439138:[2020-04-08 15:48:44.737590] E
[afr-common.c:754:afr_inode_event_gen_reset]
> > (-->/usr/lib64/glusterfs/7.2/xlator/cluster/replicate.so(+0x6f735)
[0x7fa4fb1cb735]
> > -->/usr/lib64/glusterfs/7.2/xlator/cluster/replicate.so(+0x6f0b9)
[0x7fa4fb1cb0b9]
> > -->/usr/lib64/glusterfs/7.2/xlator/cluster/replicate.so(+0x5c701)
[0x7fa4fb1b8701] )
> > 0-cm_shared-replicate-0: Resetting event gen for
f2d7abf0-5444-48d6-863d-4b128502daf9
> > 
> Could you print the function/line no. of each of these 3 functions in the
> backtrace and see who calls afr_inode_event_gen_reset? `addr2line` should
> give you that info:
> ?addr2line -f -e /your/path/to/lib/glusterfs/7.2/xlator/cluster/afr.so
> 0x6f735
> ?addr2line -f -e /your/path/to/lib/glusterfs/7.2/xlator/cluster/afr.so
> 0x6f0b9
> ?addr2line -f -e /your/path/to/lib/glusterfs/7.2/xlator/cluster/afr.so
> 0x5c701
> 
> 
> I think it is likely called from afr_lookup_done, which I don't think
is
> necessary. I will send a patch for review. Once reviews are over, I will
> share it with you and if it fixes the issue in your testing, we can merge
it
> with confidence.
> 
> Thanks,
> Ravi

Ravishankar N

2020-Apr-15 08:35 UTC

head link

[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

On 10/04/20 2:06 am, Erik Jacobson wrote:> Once again thanks for sticking with us. Here is a reply from Scott
> Titus. If you have something for us to try, we'd love it. The code had
> your patch applied when gdb was run:
>
>
> Here is the addr2line output for those addresses.  Very interesting
command, of
> which I was not aware.
>
> [root at leader3 ~]# addr2line -f
-e/usr/lib64/glusterfs/7.2/xlator/cluster/
> afr.so 0x6f735
> afr_lookup_metadata_heal_check
> afr-common.c:2803
> [root at leader3 ~]# addr2line -f
-e/usr/lib64/glusterfs/7.2/xlator/cluster/
> afr.so 0x6f0b9
> afr_lookup_done
> afr-common.c:2455
> [root at leader3 ~]# addr2line -f
-e/usr/lib64/glusterfs/7.2/xlator/cluster/
> afr.so 0x5c701
> afr_inode_event_gen_reset
> afr-common.c:755
>Right, so afr_lookup_done() is resetting the event gen to zero. This 
looks like a race between lookup and inode refresh code paths. We made 
some changes to the event generation logic in AFR. Can you apply the 
attached patch and see if it fixes the split-brain issue? It should 
apply cleanly on glusterfs-7.4.

Thanks,
Ravi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-afr-mark-pending-xattrs-as-a-part-of-metadata-heal.patch
Type: text/x-patch
Size: 3813 bytes
Desc: not available
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20200415/404b33fd/attachment.bin>

Gluster users - Apr 2020 - gnfs split brain when 1 server in 3x1 down (high load) - help request

[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request