thr3ads.net - Gluster users - [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load)

If this information is useful, please help other people find it:
Share via:

Erik Jacobson

2020-Apr-05 23:49 UTC

[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

First, it's possible our analysis is off somewhere. I never get to your
print message. I put a debug statement at the start of the function so I
know we get there (just to verify my print statements were taking
affect).

I put a print statement for the if (call_count == 0) { call there, right
after the if. I ran some tests.

I suspect that isn't a problem area. There were some interesting results
with an NFS stale file handle error going through that path. Otherwise
it's always errno=0 even in the heavy test case. I'm not concerned about
a stale NFS file handle this moment. That print was also hit heavily when
one server was down (which surprised me but I don't know the internals).

I'm trying to re-read and work through Scott's message to see if any
other print statements might be helpful.

Thank you for your help so far. I will reply back if I find something.
Otherwise suggestions welcome!

The MFG system I can access got smaller this weekend but is still large
enough to reproduce the error.

As you can tell, I work mostly at a level well above filesystem code so
thank you for staying with me as I struggle through this.

Erik
> After we hear from all children, afr_inode_refresh_subvol_cbk() then calls
afr_inode_refresh_done()-->afr_txn_refresh_done()-->afr_read_txn_refresh_done().
> But you already know this flow now.
> diff --git a/xlators/cluster/afr/src/afr-common.c
b/xlators/cluster/afr/src/afr-common.c
> index 4bfaef9e8..096ce06f0 100644
> --- a/xlators/cluster/afr/src/afr-common.c
> +++ b/xlators/cluster/afr/src/afr-common.c
> @@ -1318,6 +1318,12 @@ afr_inode_refresh_subvol_cbk(call_frame_t *frame,
void *cookie, xlator_t *this,
>          if (xdata)
>              local->replies[call_child].xdata = dict_ref(xdata);
>      }
> +    if (op_ret == -1)
> +        gf_msg_callingfn(
> +            this->name, GF_LOG_ERROR, op_errno, AFR_MSG_SPLIT_BRAIN,
> +            "Inode refresh on child:%d failed with errno:%d for
%s(%s) ",
> +            call_child, op_errno, local->loc.name,
> +            uuid_utoa(local->loc.inode->gfid));
>      if (xdata) {
>          ret = dict_get_int8(xdata, "link-count",
&need_heal);
>          local->replies[call_child].need_heal = need_heal;

Erik Jacobson

2020-Apr-06 01:22 UTC

head link

[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

During the problem case, near as I can tell, afr_final_errno(),
in the loop where tmp_errno = local->replies[i].op_errno is set,
the errno is always "2" when it gets to that point on server 3 (where
the NFS load is).

I never see a value other than 2.

I later simply put the print at the end of the function too, to double
verify non-zero exit codes. There are thousands of non-zero return
codes, all 2 when not zero. Here is an exmaple flow right before a
split-brain. I do not wish to imply the split-brain is related, it's
just an example log snip:


[2020-04-06 00:54:21.125373] E [MSGID: 0] [afr-common.c:2546:afr_final_errno]
0-erikj-afr_final_errno: erikj dbg afr_final_errno() errno from loop before
afr_higher_errno was: 2
[2020-04-06 00:54:21.125374] E [MSGID: 0] [afr-common.c:2551:afr_final_errno]
0-erikj-afr_final_errno: erikj dbg returning non-zero: 2
[2020-04-06 00:54:23.315397] E [MSGID: 0]
[afr-read-txn.c:283:afr_read_txn_refresh_done] 0-cm_shared-replicate-0: erikj
dbg crapola 1st if in afr_read_txn_refresh_done() !priv->thin_arbiter_count
-- goto to readfn
[2020-04-06 00:54:23.315432] E [MSGID: 108008]
[afr-read-txn.c:314:afr_read_txn_refresh_done] 0-cm_shared-replicate-0: Failing
READLINK on gfid 57f269ef-919d-40ec-b7fc-a7906fee648b: split-brain observed.
[Input/output error]
[2020-04-06 00:54:23.315450] W [MSGID: 112199]
[nfs3-helpers.c:3327:nfs3_log_readlink_res] 0-nfs-nfsv3:
/image/images_ro_nfs/rhel8.0/usr/lib64/libmlx5.so.1 => (XID: 1fdba2bc,
READLINK: NFS: 5(I/O error), POSIX: 5(Input/output error)) target: (null)


I am missing something. I will see if Scott and I can work together
tomorrow. Happy for any more ideas, Thank you!!


On Sun, Apr 05, 2020 at 06:49:56PM -0500, Erik Jacobson
wrote:> First, it's possible our analysis is off somewhere. I never get to your
> print message. I put a debug statement at the start of the function so I
> know we get there (just to verify my print statements were taking
> affect).
> 
> I put a print statement for the if (call_count == 0) { call there, right
> after the if. I ran some tests.
> 
> I suspect that isn't a problem area. There were some interesting
results
> with an NFS stale file handle error going through that path. Otherwise
> it's always errno=0 even in the heavy test case. I'm not concerned
about
> a stale NFS file handle this moment. That print was also hit heavily when
> one server was down (which surprised me but I don't know the
internals).
> 
> I'm trying to re-read and work through Scott's message to see if
any
> other print statements might be helpful.
> 
> Thank you for your help so far. I will reply back if I find something.
> Otherwise suggestions welcome!
> 
> The MFG system I can access got smaller this weekend but is still large
> enough to reproduce the error.
> 
> As you can tell, I work mostly at a level well above filesystem code so
> thank you for staying with me as I struggle through this.
> 
> Erik
> 
> > After we hear from all children, afr_inode_refresh_subvol_cbk() then
calls
afr_inode_refresh_done()-->afr_txn_refresh_done()-->afr_read_txn_refresh_done().
> > But you already know this flow now.
> 
> > diff --git a/xlators/cluster/afr/src/afr-common.c
b/xlators/cluster/afr/src/afr-common.c
> > index 4bfaef9e8..096ce06f0 100644
> > --- a/xlators/cluster/afr/src/afr-common.c
> > +++ b/xlators/cluster/afr/src/afr-common.c
> > @@ -1318,6 +1318,12 @@ afr_inode_refresh_subvol_cbk(call_frame_t
*frame, void *cookie, xlator_t *this,
> >          if (xdata)
> >              local->replies[call_child].xdata = dict_ref(xdata);
> >      }
> > +    if (op_ret == -1)
> > +        gf_msg_callingfn(
> > +            this->name, GF_LOG_ERROR, op_errno,
AFR_MSG_SPLIT_BRAIN,
> > +            "Inode refresh on child:%d failed with errno:%d for
%s(%s) ",
> > +            call_child, op_errno, local->loc.name,
> > +            uuid_utoa(local->loc.inode->gfid));
> >      if (xdata) {
> >          ret = dict_get_int8(xdata, "link-count",
&need_heal);
> >          local->replies[call_child].need_heal = need_heal;

Gluster users - Apr 2020 - gnfs split brain when 1 server in 3x1 down (high load) - help request

[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request