Erik Jacobson
2020-Mar-30 21:04 UTC
[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request
> Sadly I am not a developer, so I can't answer your questions.I'm not a FS o rnetwork developer either. I think there is a joke about playing one on TV but maybe it's netflix now. Enabling certain debug options made too much information for me to watch personally (but an expert could probably get through it). So I started putting targeted 'print' (gf_msg) statements in the code to see how it got its way to split-brain. Maybe this will ring a bell for someone. I can tell the only way we enter the split-brain path is through in the first if statement of afr_read_txn_refresh_done(). This means afr_read_txn_refresh_done() itself was passed "err" and that it appears thin_arbiter_count was not set (which makes sense, I'm using 1x3, not a thin arbiter). So we jump to the readfn label, and read_subvol() should still be -1. If I read right, it must mean that this if didn't return true because my print statement didn't appear: if ((ret == 0) && spb_choice >= 0) { So we're still with the original read_subvol == 1, Which gets us to the split_brain message. So now I will try to learn why afr_read_txn_refresh_done() would have 'err' set in the first place. I will also learn about afr_inode_split_brain_choice_get(). Those seem to be the two methods to have avoided falling in to the split brain hole here. I put debug statements in these locations. I will mark with !!!!!! what I see: diff -Narup glusterfs-7.2-orig/xlators/cluster/afr/src/afr-read-txn.c glusterfs-7.2-new/xlators/cluster/afr/src/afr-read-txn.c --- glusterfs-7.2-orig/xlators/cluster/afr/src/afr-read-txn.c 2020-01-15 11:43:53.887894293 -0600 +++ glusterfs-7.2-new/xlators/cluster/afr/src/afr-read-txn.c 2020-03-30 15:45:02.917104321 -0500 @@ -279,10 +279,14 @@ afr_read_txn_refresh_done(call_frame_t * priv = this->private; if (err) { - if (!priv->thin_arbiter_count) + if (!priv->thin_arbiter_count) { + gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj dbg crapola 1st if in afr_read_txn_refresh_done() !priv->thin_arbiter_count -- goto to readfn"); !!!!!!!!!!!!!!!!!!!!!! We hit this error condition and jump to readfn below !!!!!!!!!!!!!!!!!!!!!!! goto readfn; - if (err != EINVAL) + } + if (err != EINVAL) { + gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj 2nd if in afr_read_txn_refresh_done() err != EINVAL, goto readfn"); goto readfn; + } /* We need to query the good bricks and/or thin-arbiter.*/ afr_ta_read_txn_synctask(frame, this); return 0; @@ -291,6 +295,8 @@ afr_read_txn_refresh_done(call_frame_t * read_subvol = afr_read_subvol_select_by_policy(inode, this, local->readable, NULL); if (read_subvol == -1) { + gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj dbg whoops read_subvol returned -1, going to readfn"); + err = EIO; goto readfn; } @@ -304,11 +310,15 @@ afr_read_txn_refresh_done(call_frame_t * readfn: if (read_subvol == -1) { ret = afr_inode_split_brain_choice_get(inode, this, &spb_choice); - if ((ret == 0) && spb_choice >= 0) + if ((ret == 0) && spb_choice >= 0) { !!!!!!!!!!!!!!!!!!!!!! We never get here, afr_inode_split_brain_choice_get() must not have returned what was needed to enter. !!!!!!!!!!!!!!!!!!!!!! + gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj dbg read_subvol was -1 to begin with split brain choice found: %d", spb_choice); read_subvol = spb_choice; + } } if (read_subvol == -1) { + gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj dbg verify this shows up above split-brain error"); !!!!!!!!!!!!!!!!!!!!!! We hit here. Game over player. !!!!!!!!!!!!!!!!!!!!!! + AFR_SET_ERROR_AND_CHECK_SPLIT_BRAIN(-1, err); } afr_read_txn_wind(frame, this, read_subvol);
Erik Jacobson
2020-Mar-31 06:50 UTC
[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request
I note that this part of afr_read_txn() gets triggered a lot. if (afr_is_inode_refresh_reqd(inode, this, local->event_generation, event_generation)) { Maybe that's normal when one of the three servers are down (but why isn't it using its local copy by default?) The comment in that if block is: /* servers have disconnected / reconnected, and possibly rebooted, very likely changing the state of freshness of copies */ But we have one server conssitently down, not a changing situation. digging digging digging seemed to show this related to cache invalidation.... Because the paths seemed to suggest the inode needed refreshing and that seems handled by a case statement named GF_UPCALL_CACHE_INVALIDATION However, that must have been a wrong turn since turning off cache invalidation didn't help. I'm struggling to wrap my head around the code base and without the background in these concepts it's a tough hill to climb. I am going to have to try this again some day with fresh eyes and go to bed; the machine I have easy access to is going away in the morning. Now I'll have to reserve time on a contended one but I will do that and continue digging. Any suggestions would be greatly appreciated as I think I'm starting to tip over here on this one. On Mon, Mar 30, 2020 at 04:04:39PM -0500, Erik Jacobson wrote:> > Sadly I am not a developer, so I can't answer your questions. > > I'm not a FS o rnetwork developer either. I think there is a joke about > playing one on TV but maybe it's netflix now. > > Enabling certain debug options made too much information for me to watch > personally (but an expert could probably get through it). > > So I started putting targeted 'print' (gf_msg) statements in the code to > see how it got its way to split-brain. Maybe this will ring a bell > for someone. > > I can tell the only way we enter the split-brain path is through in the > first if statement of afr_read_txn_refresh_done(). > > This means afr_read_txn_refresh_done() itself was passed "err" and > that it appears thin_arbiter_count was not set (which makes sense, > I'm using 1x3, not a thin arbiter). > > So we jump to the readfn label, and read_subvol() should still be -1. > If I read right, it must mean that this if didn't return true because > my print statement didn't appear: > if ((ret == 0) && spb_choice >= 0) { > > So we're still with the original read_subvol == 1, > Which gets us to the split_brain message. > > So now I will try to learn why afr_read_txn_refresh_done() would have > 'err' set in the first place. I will also learn about > afr_inode_split_brain_choice_get(). Those seem to be the two methods to > have avoided falling in to the split brain hole here. > > > I put debug statements in these locations. I will mark with !!!!!! what > I see: > > > > diff -Narup glusterfs-7.2-orig/xlators/cluster/afr/src/afr-read-txn.c glusterfs-7.2-new/xlators/cluster/afr/src/afr-read-txn.c > --- glusterfs-7.2-orig/xlators/cluster/afr/src/afr-read-txn.c 2020-01-15 11:43:53.887894293 -0600 > +++ glusterfs-7.2-new/xlators/cluster/afr/src/afr-read-txn.c 2020-03-30 15:45:02.917104321 -0500 > @@ -279,10 +279,14 @@ afr_read_txn_refresh_done(call_frame_t * > priv = this->private; > > if (err) { > - if (!priv->thin_arbiter_count) > + if (!priv->thin_arbiter_count) { > + gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj dbg crapola 1st if in afr_read_txn_refresh_done() !priv->thin_arbiter_count -- goto to readfn"); > !!!!!!!!!!!!!!!!!!!!!! > We hit this error condition and jump to readfn below > !!!!!!!!!!!!!!!!!!!!!!! > goto readfn; > - if (err != EINVAL) > + } > + if (err != EINVAL) { > + gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj 2nd if in afr_read_txn_refresh_done() err != EINVAL, goto readfn"); > goto readfn; > + } > /* We need to query the good bricks and/or thin-arbiter.*/ > afr_ta_read_txn_synctask(frame, this); > return 0; > @@ -291,6 +295,8 @@ afr_read_txn_refresh_done(call_frame_t * > read_subvol = afr_read_subvol_select_by_policy(inode, this, local->readable, > NULL); > if (read_subvol == -1) { > + gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj dbg whoops read_subvol returned -1, going to readfn"); > + > err = EIO; > goto readfn; > } > @@ -304,11 +310,15 @@ afr_read_txn_refresh_done(call_frame_t * > readfn: > if (read_subvol == -1) { > ret = afr_inode_split_brain_choice_get(inode, this, &spb_choice); > - if ((ret == 0) && spb_choice >= 0) > + if ((ret == 0) && spb_choice >= 0) { > !!!!!!!!!!!!!!!!!!!!!! > We never get here, afr_inode_split_brain_choice_get() must not have > returned what was needed to enter. > !!!!!!!!!!!!!!!!!!!!!! > + gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj dbg read_subvol was -1 to begin with split brain choice found: %d", spb_choice); > read_subvol = spb_choice; > + } > } > > if (read_subvol == -1) { > + gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj dbg verify this shows up above split-brain error"); > !!!!!!!!!!!!!!!!!!!!!! > We hit here. Game over player. > !!!!!!!!!!!!!!!!!!!!!! > + > AFR_SET_ERROR_AND_CHECK_SPLIT_BRAIN(-1, err); > } > afr_read_txn_wind(frame, this, read_subvol);