Ravishankar N
2020-Apr-09 06:08 UTC
[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request
On 08/04/20 9:55 pm, Erik Jacobson wrote:> 9439138:[2020-04-08 15:48:44.737590] E [afr-common.c:754:afr_inode_event_gen_reset] > (-->/usr/lib64/glusterfs/7.2/xlator/cluster/replicate.so(+0x6f735) [0x7fa4fb1cb735] > -->/usr/lib64/glusterfs/7.2/xlator/cluster/replicate.so(+0x6f0b9) [0x7fa4fb1cb0b9] > -->/usr/lib64/glusterfs/7.2/xlator/cluster/replicate.so(+0x5c701) [0x7fa4fb1b8701] ) > 0-cm_shared-replicate-0: Resetting event gen for f2d7abf0-5444-48d6-863d-4b128502daf9 >Could you print the function/line no. of each of these 3 functions in the backtrace and see who calls afr_inode_event_gen_reset? `addr2line` should give you that info: ?addr2line -f -e /your/path/to/lib/glusterfs/7.2/xlator/cluster/afr.so 0x6f735 ?addr2line -f -e /your/path/to/lib/glusterfs/7.2/xlator/cluster/afr.so 0x6f0b9 ?addr2line -f -e /your/path/to/lib/glusterfs/7.2/xlator/cluster/afr.so 0x5c701 I think it is likely called from afr_lookup_done, which I don't think is necessary. I will send a patch for review. Once reviews are over, I will share it with you and if it fixes the issue in your testing, we can merge it with confidence. Thanks, Ravi
Erik Jacobson
2020-Apr-09 20:36 UTC
[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request
Once again thanks for sticking with us. Here is a reply from Scott Titus. If you have something for us to try, we'd love it. The code had your patch applied when gdb was run: Here is the addr2line output for those addresses. Very interesting command, of which I was not aware. [root at leader3 ~]# addr2line -f -e /usr/lib64/glusterfs/7.2/xlator/cluster/ afr.so 0x6f735 afr_lookup_metadata_heal_check afr-common.c:2803 [root at leader3 ~]# addr2line -f -e /usr/lib64/glusterfs/7.2/xlator/cluster/ afr.so 0x6f0b9 afr_lookup_done afr-common.c:2455 [root at leader3 ~]# addr2line -f -e /usr/lib64/glusterfs/7.2/xlator/cluster/ afr.so 0x5c701 afr_inode_event_gen_reset afr-common.c:755 Thanks -Scott On Thu, Apr 09, 2020 at 11:38:04AM +0530, Ravishankar N wrote:> > On 08/04/20 9:55 pm, Erik Jacobson wrote: > > 9439138:[2020-04-08 15:48:44.737590] E [afr-common.c:754:afr_inode_event_gen_reset] > > (-->/usr/lib64/glusterfs/7.2/xlator/cluster/replicate.so(+0x6f735) [0x7fa4fb1cb735] > > -->/usr/lib64/glusterfs/7.2/xlator/cluster/replicate.so(+0x6f0b9) [0x7fa4fb1cb0b9] > > -->/usr/lib64/glusterfs/7.2/xlator/cluster/replicate.so(+0x5c701) [0x7fa4fb1b8701] ) > > 0-cm_shared-replicate-0: Resetting event gen for f2d7abf0-5444-48d6-863d-4b128502daf9 > > > Could you print the function/line no. of each of these 3 functions in the > backtrace and see who calls afr_inode_event_gen_reset? `addr2line` should > give you that info: > ?addr2line -f -e /your/path/to/lib/glusterfs/7.2/xlator/cluster/afr.so > 0x6f735 > ?addr2line -f -e /your/path/to/lib/glusterfs/7.2/xlator/cluster/afr.so > 0x6f0b9 > ?addr2line -f -e /your/path/to/lib/glusterfs/7.2/xlator/cluster/afr.so > 0x5c701 > > > I think it is likely called from afr_lookup_done, which I don't think is > necessary. I will send a patch for review. Once reviews are over, I will > share it with you and if it fixes the issue in your testing, we can merge it > with confidence. > > Thanks, > Ravi