thr3ads.net - Gluster users - [Gluster-users] gnfs split brain when 1 server in 3x1 down (high load)

If this information is useful, please help other people find it:
Share via:

Strahil Nikolov

2020-Mar-30 19:35 UTC

[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

On March 30, 2020 7:54:59 PM GMT+03:00, Erik Jacobson <erik.jacobson at
hpe.com> wrote:>> Hi Erik,
>> Sadly I didn't have the time to take a look in your logs, but I
would
>like to ask you whether you have statiatics of the network bandwidth
>usage.
>> Could it be possible that the gNFS server is  starved for bandwidth
>and fails to reach all bricks  leading to 'split-brain' errors ?
>> 
>
>I understand. I doubt there is a bandwidth issue but I'll add this to
>my
>checks. We have 288 nodes per server normally and they run fine with
>all
>servers up. The 76 number is just what we happened to have access to on
>an internal system.
>
>Question: What you mentioned above, and a feeling I have too personally
>is -- is the split-brain error actually a generic catch-all error for
>not being able to get access to a file? So when it says
"split-brain"
>could it really mean any type of access error? Could it also be given
>when there is a IO timeout or something?
>
>I'm starting to break open the source code to look around but I think
>my
>head will explode before I understand it enough. I will still give it a
>shot.
>
>I have access to this system until later tonight. Then it goes away. We
>have duplicated it on another system that stays, but the machine
>internally is so contended for that I wouldn't get a time slot until
>later in the week anyway. Trying to make as much use of this
"gift"
>machine as I can :) :)
>
>Thanks again for the replies so far.
>
>Erik
Hey Erik,

Sadly I am not a  developer,  so I can't answer your questions.
Still,  a  bandwith starvation looks like a possible  (at least to me) reason -
although error messages and timeouts should fill the logs.

I can recommend you to increase logging for both brick & volume to the
maximum and try to reproduce the issue.
Keep in mind that the logs can grow very fast.

Best Regards,
Strahil Nikolov

Erik Jacobson

2020-Mar-30 21:04 UTC

head link

[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

> Sadly I am not a  developer,  so I can't answer your questions.
I'm not a FS o rnetwork developer either. I think there is a joke about
playing one on TV but maybe it's netflix now.

Enabling certain debug options made too much information for me to watch
personally (but an expert could probably get through it).

So I started putting targeted 'print' (gf_msg) statements in the code to
see how it got its way to split-brain. Maybe this will ring a bell
for someone.

I can tell the only way we enter the split-brain path is through in the
first if statement of afr_read_txn_refresh_done().

This means afr_read_txn_refresh_done() itself was passed "err" and
that it appears thin_arbiter_count was not set (which makes sense,
I'm using 1x3, not a thin arbiter).

So we jump to the readfn label, and read_subvol() should still be -1.
If I read right, it must mean that this if didn't return true because
my print statement didn't appear:
if ((ret == 0) && spb_choice >= 0) {

So we're still with the original read_subvol == 1,
Which gets us to the split_brain message.

So now I will try to learn why afr_read_txn_refresh_done() would have
'err' set in the first place. I will also learn about
afr_inode_split_brain_choice_get(). Those seem to be the two methods to
have avoided falling in to the split brain hole here.


I put debug statements in these locations. I will mark with !!!!!! what
I see:



diff -Narup glusterfs-7.2-orig/xlators/cluster/afr/src/afr-read-txn.c
glusterfs-7.2-new/xlators/cluster/afr/src/afr-read-txn.c
--- glusterfs-7.2-orig/xlators/cluster/afr/src/afr-read-txn.c	2020-01-15
11:43:53.887894293 -0600
+++ glusterfs-7.2-new/xlators/cluster/afr/src/afr-read-txn.c	2020-03-30
15:45:02.917104321 -0500
@@ -279,10 +279,14 @@ afr_read_txn_refresh_done(call_frame_t *
     priv = this->private;

     if (err) {
-        if (!priv->thin_arbiter_count)
+        if (!priv->thin_arbiter_count) {
+            gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj dbg crapola 1st
if in afr_read_txn_refresh_done() !priv->thin_arbiter_count -- goto to
readfn");
!!!!!!!!!!!!!!!!!!!!!!
We hit this error condition and jump to readfn below
!!!!!!!!!!!!!!!!!!!!!!!
             goto readfn;
-        if (err != EINVAL)
+        }
+        if (err != EINVAL) {
+            gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj 2nd if in
afr_read_txn_refresh_done() err != EINVAL, goto readfn");
             goto readfn;
+        }
         /* We need to query the good bricks and/or thin-arbiter.*/
         afr_ta_read_txn_synctask(frame, this);
         return 0;
@@ -291,6 +295,8 @@ afr_read_txn_refresh_done(call_frame_t *
     read_subvol = afr_read_subvol_select_by_policy(inode, this,
local->readable,
                                                    NULL);
     if (read_subvol == -1) {
+        gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj dbg whoops
read_subvol returned -1, going to readfn");
+
         err = EIO;
         goto readfn;
     }
@@ -304,11 +310,15 @@ afr_read_txn_refresh_done(call_frame_t *
 readfn:
     if (read_subvol == -1) {
         ret = afr_inode_split_brain_choice_get(inode, this, &spb_choice);
-        if ((ret == 0) && spb_choice >= 0)
+        if ((ret == 0) && spb_choice >= 0) {
!!!!!!!!!!!!!!!!!!!!!!
We never get here, afr_inode_split_brain_choice_get() must not have
returned what was needed to enter.
!!!!!!!!!!!!!!!!!!!!!!
+            gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj dbg read_subvol
was -1 to begin with split brain choice found: %d", spb_choice);
             read_subvol = spb_choice;
+        }
     }

     if (read_subvol == -1) {
+       gf_msg(this->name, GF_LOG_ERROR,0,0,"erikj dbg verify this shows
up above split-brain error");
!!!!!!!!!!!!!!!!!!!!!!
We hit here. Game over player.
!!!!!!!!!!!!!!!!!!!!!!
+
         AFR_SET_ERROR_AND_CHECK_SPLIT_BRAIN(-1, err);
     }
     afr_read_txn_wind(frame, this, read_subvol);

Gluster users - Mar 2020 - gnfs split brain when 1 server in 3x1 down (high load) - help request

[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request

[Gluster-users] gnfs split brain when 1 server in 3x1 down (high load) - help request