Ravishankar N
2020-Apr-05 08:35 UTC
[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request
On 04/04/20 9:12 pm, Erik Jacobson wrote:> This leaves us with afr_quorum_errno() returning the error. > > afr_final_errno() iterates through the 'children', looking for > valid errors within the replies for the transaction (refresh transaction?). > The function returns the highest valued error, which must be EIO (value of 5) > in this case. > > I have not looked into how or what would set the error value in the > replies array,The errror numbers that you see in the replies array in afr_final_errno() are set in afr_inode_refresh_subvol_cbk(). During inode refresh (which is essentially a lookup), AFR sends the the lookup request on all its connected children and the replies from each one of them are captured in afr_inode_refresh_subvol_cbk(). So adding a log here can identify if we got EIO from any of its children. See attached patch for an example. After we hear from all children, afr_inode_refresh_subvol_cbk() then calls afr_inode_refresh_done()-->afr_txn_refresh_done()-->afr_read_txn_refresh_done(). But you already know this flow now. -------------- next part -------------- A non-text attachment was scrubbed... Name: log.patch Type: text/x-patch Size: 840 bytes Desc: not available URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200405/fbd75665/attachment.bin>
Erik Jacobson
2020-Apr-05 23:49 UTC
[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request
First, it's possible our analysis is off somewhere. I never get to your print message. I put a debug statement at the start of the function so I know we get there (just to verify my print statements were taking affect). I put a print statement for the if (call_count == 0) { call there, right after the if. I ran some tests. I suspect that isn't a problem area. There were some interesting results with an NFS stale file handle error going through that path. Otherwise it's always errno=0 even in the heavy test case. I'm not concerned about a stale NFS file handle this moment. That print was also hit heavily when one server was down (which surprised me but I don't know the internals). I'm trying to re-read and work through Scott's message to see if any other print statements might be helpful. Thank you for your help so far. I will reply back if I find something. Otherwise suggestions welcome! The MFG system I can access got smaller this weekend but is still large enough to reproduce the error. As you can tell, I work mostly at a level well above filesystem code so thank you for staying with me as I struggle through this. Erik> After we hear from all children, afr_inode_refresh_subvol_cbk() then calls afr_inode_refresh_done()-->afr_txn_refresh_done()-->afr_read_txn_refresh_done(). > But you already know this flow now.> diff --git a/xlators/cluster/afr/src/afr-common.c b/xlators/cluster/afr/src/afr-common.c > index 4bfaef9e8..096ce06f0 100644 > --- a/xlators/cluster/afr/src/afr-common.c > +++ b/xlators/cluster/afr/src/afr-common.c > @@ -1318,6 +1318,12 @@ afr_inode_refresh_subvol_cbk(call_frame_t *frame, void *cookie, xlator_t *this, > if (xdata) > local->replies[call_child].xdata = dict_ref(xdata); > } > + if (op_ret == -1) > + gf_msg_callingfn( > + this->name, GF_LOG_ERROR, op_errno, AFR_MSG_SPLIT_BRAIN, > + "Inode refresh on child:%d failed with errno:%d for %s(%s) ", > + call_child, op_errno, local->loc.name, > + uuid_utoa(local->loc.inode->gfid)); > if (xdata) { > ret = dict_get_int8(xdata, "link-count", &need_heal); > local->replies[call_child].need_heal = need_heal;