Erik Jacobson
2020-Apr-09 20:36 UTC
[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request
Once again thanks for sticking with us. Here is a reply from Scott Titus. If you have something for us to try, we'd love it. The code had your patch applied when gdb was run: Here is the addr2line output for those addresses. Very interesting command, of which I was not aware. [root at leader3 ~]# addr2line -f -e /usr/lib64/glusterfs/7.2/xlator/cluster/ afr.so 0x6f735 afr_lookup_metadata_heal_check afr-common.c:2803 [root at leader3 ~]# addr2line -f -e /usr/lib64/glusterfs/7.2/xlator/cluster/ afr.so 0x6f0b9 afr_lookup_done afr-common.c:2455 [root at leader3 ~]# addr2line -f -e /usr/lib64/glusterfs/7.2/xlator/cluster/ afr.so 0x5c701 afr_inode_event_gen_reset afr-common.c:755 Thanks -Scott On Thu, Apr 09, 2020 at 11:38:04AM +0530, Ravishankar N wrote:> > On 08/04/20 9:55 pm, Erik Jacobson wrote: > > 9439138:[2020-04-08 15:48:44.737590] E [afr-common.c:754:afr_inode_event_gen_reset] > > (-->/usr/lib64/glusterfs/7.2/xlator/cluster/replicate.so(+0x6f735) [0x7fa4fb1cb735] > > -->/usr/lib64/glusterfs/7.2/xlator/cluster/replicate.so(+0x6f0b9) [0x7fa4fb1cb0b9] > > -->/usr/lib64/glusterfs/7.2/xlator/cluster/replicate.so(+0x5c701) [0x7fa4fb1b8701] ) > > 0-cm_shared-replicate-0: Resetting event gen for f2d7abf0-5444-48d6-863d-4b128502daf9 > > > Could you print the function/line no. of each of these 3 functions in the > backtrace and see who calls afr_inode_event_gen_reset? `addr2line` should > give you that info: > ?addr2line -f -e /your/path/to/lib/glusterfs/7.2/xlator/cluster/afr.so > 0x6f735 > ?addr2line -f -e /your/path/to/lib/glusterfs/7.2/xlator/cluster/afr.so > 0x6f0b9 > ?addr2line -f -e /your/path/to/lib/glusterfs/7.2/xlator/cluster/afr.so > 0x5c701 > > > I think it is likely called from afr_lookup_done, which I don't think is > necessary. I will send a patch for review. Once reviews are over, I will > share it with you and if it fixes the issue in your testing, we can merge it > with confidence. > > Thanks, > Ravi
Ravishankar N
2020-Apr-15 08:35 UTC
[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request
On 10/04/20 2:06 am, Erik Jacobson wrote:> Once again thanks for sticking with us. Here is a reply from Scott > Titus. If you have something for us to try, we'd love it. The code had > your patch applied when gdb was run: > > > Here is the addr2line output for those addresses. Very interesting command, of > which I was not aware. > > [root at leader3 ~]# addr2line -f -e/usr/lib64/glusterfs/7.2/xlator/cluster/ > afr.so 0x6f735 > afr_lookup_metadata_heal_check > afr-common.c:2803 > [root at leader3 ~]# addr2line -f -e/usr/lib64/glusterfs/7.2/xlator/cluster/ > afr.so 0x6f0b9 > afr_lookup_done > afr-common.c:2455 > [root at leader3 ~]# addr2line -f -e/usr/lib64/glusterfs/7.2/xlator/cluster/ > afr.so 0x5c701 > afr_inode_event_gen_reset > afr-common.c:755 >Right, so afr_lookup_done() is resetting the event gen to zero. This looks like a race between lookup and inode refresh code paths. We made some changes to the event generation logic in AFR. Can you apply the attached patch and see if it fixes the split-brain issue? It should apply cleanly on glusterfs-7.4. Thanks, Ravi -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-afr-mark-pending-xattrs-as-a-part-of-metadata-heal.patch Type: text/x-patch Size: 3813 bytes Desc: not available URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200415/404b33fd/attachment.bin>