Erik Jacobson
2020-Apr-05 23:49 UTC
[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request
First, it's possible our analysis is off somewhere. I never get to your print message. I put a debug statement at the start of the function so I know we get there (just to verify my print statements were taking affect). I put a print statement for the if (call_count == 0) { call there, right after the if. I ran some tests. I suspect that isn't a problem area. There were some interesting results with an NFS stale file handle error going through that path. Otherwise it's always errno=0 even in the heavy test case. I'm not concerned about a stale NFS file handle this moment. That print was also hit heavily when one server was down (which surprised me but I don't know the internals). I'm trying to re-read and work through Scott's message to see if any other print statements might be helpful. Thank you for your help so far. I will reply back if I find something. Otherwise suggestions welcome! The MFG system I can access got smaller this weekend but is still large enough to reproduce the error. As you can tell, I work mostly at a level well above filesystem code so thank you for staying with me as I struggle through this. Erik> After we hear from all children, afr_inode_refresh_subvol_cbk() then calls afr_inode_refresh_done()-->afr_txn_refresh_done()-->afr_read_txn_refresh_done(). > But you already know this flow now.> diff --git a/xlators/cluster/afr/src/afr-common.c b/xlators/cluster/afr/src/afr-common.c > index 4bfaef9e8..096ce06f0 100644 > --- a/xlators/cluster/afr/src/afr-common.c > +++ b/xlators/cluster/afr/src/afr-common.c > @@ -1318,6 +1318,12 @@ afr_inode_refresh_subvol_cbk(call_frame_t *frame, void *cookie, xlator_t *this, > if (xdata) > local->replies[call_child].xdata = dict_ref(xdata); > } > + if (op_ret == -1) > + gf_msg_callingfn( > + this->name, GF_LOG_ERROR, op_errno, AFR_MSG_SPLIT_BRAIN, > + "Inode refresh on child:%d failed with errno:%d for %s(%s) ", > + call_child, op_errno, local->loc.name, > + uuid_utoa(local->loc.inode->gfid)); > if (xdata) { > ret = dict_get_int8(xdata, "link-count", &need_heal); > local->replies[call_child].need_heal = need_heal;
Erik Jacobson
2020-Apr-06 01:22 UTC
[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request
During the problem case, near as I can tell, afr_final_errno(), in the loop where tmp_errno = local->replies[i].op_errno is set, the errno is always "2" when it gets to that point on server 3 (where the NFS load is). I never see a value other than 2. I later simply put the print at the end of the function too, to double verify non-zero exit codes. There are thousands of non-zero return codes, all 2 when not zero. Here is an exmaple flow right before a split-brain. I do not wish to imply the split-brain is related, it's just an example log snip: [2020-04-06 00:54:21.125373] E [MSGID: 0] [afr-common.c:2546:afr_final_errno] 0-erikj-afr_final_errno: erikj dbg afr_final_errno() errno from loop before afr_higher_errno was: 2 [2020-04-06 00:54:21.125374] E [MSGID: 0] [afr-common.c:2551:afr_final_errno] 0-erikj-afr_final_errno: erikj dbg returning non-zero: 2 [2020-04-06 00:54:23.315397] E [MSGID: 0] [afr-read-txn.c:283:afr_read_txn_refresh_done] 0-cm_shared-replicate-0: erikj dbg crapola 1st if in afr_read_txn_refresh_done() !priv->thin_arbiter_count -- goto to readfn [2020-04-06 00:54:23.315432] E [MSGID: 108008] [afr-read-txn.c:314:afr_read_txn_refresh_done] 0-cm_shared-replicate-0: Failing READLINK on gfid 57f269ef-919d-40ec-b7fc-a7906fee648b: split-brain observed. [Input/output error] [2020-04-06 00:54:23.315450] W [MSGID: 112199] [nfs3-helpers.c:3327:nfs3_log_readlink_res] 0-nfs-nfsv3: /image/images_ro_nfs/rhel8.0/usr/lib64/libmlx5.so.1 => (XID: 1fdba2bc, READLINK: NFS: 5(I/O error), POSIX: 5(Input/output error)) target: (null) I am missing something. I will see if Scott and I can work together tomorrow. Happy for any more ideas, Thank you!! On Sun, Apr 05, 2020 at 06:49:56PM -0500, Erik Jacobson wrote:> First, it's possible our analysis is off somewhere. I never get to your > print message. I put a debug statement at the start of the function so I > know we get there (just to verify my print statements were taking > affect). > > I put a print statement for the if (call_count == 0) { call there, right > after the if. I ran some tests. > > I suspect that isn't a problem area. There were some interesting results > with an NFS stale file handle error going through that path. Otherwise > it's always errno=0 even in the heavy test case. I'm not concerned about > a stale NFS file handle this moment. That print was also hit heavily when > one server was down (which surprised me but I don't know the internals). > > I'm trying to re-read and work through Scott's message to see if any > other print statements might be helpful. > > Thank you for your help so far. I will reply back if I find something. > Otherwise suggestions welcome! > > The MFG system I can access got smaller this weekend but is still large > enough to reproduce the error. > > As you can tell, I work mostly at a level well above filesystem code so > thank you for staying with me as I struggle through this. > > Erik > > > After we hear from all children, afr_inode_refresh_subvol_cbk() then calls afr_inode_refresh_done()-->afr_txn_refresh_done()-->afr_read_txn_refresh_done(). > > But you already know this flow now. > > > diff --git a/xlators/cluster/afr/src/afr-common.c b/xlators/cluster/afr/src/afr-common.c > > index 4bfaef9e8..096ce06f0 100644 > > --- a/xlators/cluster/afr/src/afr-common.c > > +++ b/xlators/cluster/afr/src/afr-common.c > > @@ -1318,6 +1318,12 @@ afr_inode_refresh_subvol_cbk(call_frame_t *frame, void *cookie, xlator_t *this, > > if (xdata) > > local->replies[call_child].xdata = dict_ref(xdata); > > } > > + if (op_ret == -1) > > + gf_msg_callingfn( > > + this->name, GF_LOG_ERROR, op_errno, AFR_MSG_SPLIT_BRAIN, > > + "Inode refresh on child:%d failed with errno:%d for %s(%s) ", > > + call_child, op_errno, local->loc.name, > > + uuid_utoa(local->loc.inode->gfid)); > > if (xdata) { > > ret = dict_get_int8(xdata, "link-count", &need_heal); > > local->replies[call_child].need_heal = need_heal;