thr3ads.net - Gluster users - [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load)

If this information is useful, please help other people find it:
Share via:

Erik Jacobson

2020-Apr-02 07:02 UTC

[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

> Hmm, afr_inode_refresh_done() is called with error=0 and by the time we
> reach afr_txn_refresh_done(), it becomes 5(i.e. EIO).
> So afr_inode_refresh_done() is changing it to 5. Maybe you can put
> breakpoints/ log messages in afr_inode_refresh_done() at the places where
> error is getting changed and see where the assignment happens.
I had a lot of struggles tonight getting the system ready to go.  I had
seg11's in glusterfs(nfs) but I think it was related to not all brick
processes stopping with glusterd. I also re-installed and/or the print
statements. I'm not sure. I'm not used to seeing that.

I put print statements everywhere I thought error could change and got
no printed log messages.

I put break points where error would change and we didn't hit them.

I then point a breakpoint at

break xlators/cluster/afr/src/afr-common.c:1298 if error != 0

---> refresh_done:
    afr_txn_refresh_done(frame, this, error);

And it never triggered (despite split-brain messages and my crapola
message).

So I'm unable to explain this transition. I'm also not a gdb expert.
I still see the same back trace though.

#1  0x00007fff68938d7b in afr_txn_refresh_done (
    frame=frame at entry=0x7ffd540ed8e8, this=this at entry=0x7fff64013720,
err=5,
    err at entry=0) at afr-common.c:1222
#2  0x00007fff689391f0 in afr_inode_refresh_done (
    frame=frame at entry=0x7ffd540ed8e8, this=this at entry=0x7fff64013720,
error=0)
    at afr-common.c:1299

Is there other advice you might have for me to try?

I'm very eager to solve this problem, which is why I'm staying up late
to get machine time. I must go to bed now. I look forward to another
shot tomorrow night if you have more ideas to try.

Erik

Erik Jacobson

2020-Apr-07 23:29 UTC

head link

[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

My co-worker prefers I keep driving the discussion since this isn't his
normal area. But he's far better at digging in to these low level calls
than I am. So I am passing along his analysis so far. We are wondering
if we have enough information yet to turn on any light bulbs in terms of
volume settings, system settings, or a code change... Or a suggested
path for further debug.

Recall from earlier in the thread, this a 3-way replicate single
subvolume gluster fileystem that gets split-brain errors under heavy
gnfs load when one of the three servers has gone down, representing a
customer-reported problem. 

Scott's analysis is below. Next steps truly appreciated !!



Apologies for misinterpreting the backtrace.

#0? afr_read_txn_refresh_done (frame=0x7ffcf4146478, 
this=0x7fff64013720, err=5) at afr-read-txn.c:312
#1? 0x00007fff68938d2b in afr_txn_refresh_done 
(frame=frame at entry=0x7ffcf4146478, this=this at entry=0x7fff64013720, 
err=5, err at entry=0)
 ??? at afr-common.c:1222
#2? 0x00007fff68939003 in afr_inode_refresh_done 
(frame=frame at entry=0x7ffcf4146478, this=this at entry=0x7fff64013720,
error=0)
 ??? at afr-common.c:1294

instead of the #1/#2 above calling the functions afr_txn_refresh_done 
and afr_inode_refresh_done respectively, they are calling a function 
within afr_txn_refresh_done and afr_inode_refresh_done respectively.

So, afr_txn_refresh_done (frame=frame at entry=0x7ffcf4146478, 
this=this at entry=0x7fff64013720, err=5, err at entry=0)at 
afr-common.c:1222calls a function at line number 1222 in aft-common.c 
within the function afr_txn_refresh_done:

1163: int
1164: afr_txn_refresh_done(call_frame_t *frame, xlator_t *this, int err)
1165: {
1166:? ?? call_frame_t *heal_frame = NULL;
1167:? ?? afr_local_t *heal_local = NULL;
1168:? ?? afr_local_t *local = NULL;
1169:? ?? afr_private_t *priv = NULL;
1170: ??? inode_t *inode = NULL;
1171: ??? int event_generation = 0;
1172: ??? int read_subvol = -1;
1173: ??? int ret = 0;
1174:
1175: ??? local = frame->local;
1176: ??? inode = local->inode;
1177: ??? priv = this->private;
1178:
1179: ??? if (err)
1180: ??????? goto refresh_done;
1181:
1182: ??? if (local->op == GF_FOP_LOOKUP)
1183: ??????? goto refresh_done;
1184:
1185: ??? ret = afr_inode_get_readable(frame, inode, this, local->readable,
1186: &event_generation, local->transaction.type);
1187:
1188: ??? if (ret == -EIO || (local->is_read_txn &&
!event_generation)) {
1189: ??????? /* No readable subvolume even after refresh ==> splitbrain.*/
*1190: **??????? if (!priv->fav_child_policy) {*
*1191: ????????? err = EIO;
**1192: ??????????? goto refresh_done;
**1193: **??????? }*
1194: ??????? read_subvol = afr_sh_get_fav_by_policy(this, 
local->replies, inode,
1195: NULL);
1196: ??????? if (read_subvol == -1) {
1197: ??????????? err = EIO;
1198: ??????????? goto refresh_done;
1199: ??????? }
1200:
1201: ??????? heal_frame = afr_frame_create(this, NULL);
1202: ??????? if (!heal_frame) {
1203: ??????????? err = EIO;
1204: ??????????? goto refresh_done;
1205: ??????? }
1206: ??????? heal_local = heal_frame->local;
1207: ??????? heal_local->xdata_req = dict_new();
1208: ??????? if (!heal_local->xdata_req) {
1209: ??????????? err = EIO;
1210: AFR_STACK_DESTROY(heal_frame);
1211: ??????????? goto refresh_done;
1212: ??????? }
1213: ??????? heal_local->heal_frame = frame;
1214: ??????? ret = synctask_new(this->ctx->env, 
afr_fav_child_reset_sink_xattrs,
1215: afr_fav_child_reset_sink_xattrs_cbk, heal_frame,
1216: heal_frame);
1217: ??????? return 0;
1218:? ? }
1219:
1220: refresh_done:
1221:? ?? afr_local_replies_wipe(local, this->private);
*1222:? ?? local->refreshfn(frame, this, err);*
1223:
1224:? ?? return 0;
1225: }

So, backtrace #1 represents the following function call 
local->refreshfn(frame=frame at entry=0x7ffcf4146478, 
this=this at entry=0x7fff64013720, err=5, err at entry=0)
This is the 1st example of EIO being set.

Setting a breakpoint at *1190: **??????? if (!priv->fav_child_policy) { 
*reveals that ret is not set, but local->is_read_txn is set and 
event_generation is zero (xlators/cluster/afr/src/afr.h:108), so the 
conditional at 1188 is true.? Furthermore, priv->fav_child_policy is set 
to AFR_FAV_CHILD_NONE which is zero, so we found where the error is set 
to EIO, line 1191.

The following is the gdb output:

(gdb) print ret
$1 = 0
(gdb) print local->is_read_txn
$2 = true
(gdb) print event_generation
$3 = 0

(gdb) print priv->fav_child_policy
$4 = AFR_FAV_CHILD_NONE

I am not sure what this signifies though.? It appears to be a read 
transaction with no event generation and no favorite child policy.

Feel free to ask for clarification in case my thought process went awry 
somewhere.

Thanks,
-Scott



On Thu, Apr 02, 2020 at 02:02:46AM -0500, Erik Jacobson
wrote:> > Hmm, afr_inode_refresh_done() is called with error=0 and by the time
we
> > reach afr_txn_refresh_done(), it becomes 5(i.e. EIO).
> > So afr_inode_refresh_done() is changing it to 5. Maybe you can put
> > breakpoints/ log messages in afr_inode_refresh_done() at the places
where
> > error is getting changed and see where the assignment happens.
> 
> I had a lot of struggles tonight getting the system ready to go.  I had
> seg11's in glusterfs(nfs) but I think it was related to not all brick
> processes stopping with glusterd. I also re-installed and/or the print
> statements. I'm not sure. I'm not used to seeing that.
> 
> I put print statements everywhere I thought error could change and got
> no printed log messages.
> 
> I put break points where error would change and we didn't hit them.
> 
> I then point a breakpoint at
> 
> break xlators/cluster/afr/src/afr-common.c:1298 if error != 0
> 
> ---> refresh_done:
>     afr_txn_refresh_done(frame, this, error);
> 
> And it never triggered (despite split-brain messages and my crapola
> message).
> 
> So I'm unable to explain this transition. I'm also not a gdb
expert.
> I still see the same back trace though.
> 
> #1  0x00007fff68938d7b in afr_txn_refresh_done (
>     frame=frame at entry=0x7ffd540ed8e8, this=this at entry=0x7fff64013720,
err=5,
>     err at entry=0) at afr-common.c:1222
> #2  0x00007fff689391f0 in afr_inode_refresh_done (
>     frame=frame at entry=0x7ffd540ed8e8, this=this at entry=0x7fff64013720,
error=0)
>     at afr-common.c:1299
> 
> Is there other advice you might have for me to try?
> 
> I'm very eager to solve this problem, which is why I'm staying up
late
> to get machine time. I must go to bed now. I look forward to another
> shot tomorrow night if you have more ideas to try.
> 
> Erik

Erik Jacobson
Software Engineer

erik.jacobson at hpe.com
+1 612 851 0550 Office

Eagan, MN
hpe.com

Gluster users - Apr 2020 - gnfs split brain when 1 server in 3x1 down (high load) - help request

[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request