thr3ads.net - Gluster users - [Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware [Dec 2017]

If this information is useful, please help other people find it:
Share via:

Karthik Subrahmanya

2017-Dec-21 06:18 UTC

[Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware

Hey,

Can you give us the volume info output for this volume?
Why are you not able to get the xattrs from arbiter brick? It is the same
way as you do it on data bricks.
The changelog xattrs are named trusted.afr.virt_images-client-{1,2,3} in
the getxattr outputs you have provided.
Did you do a remove-brick and add-brick any time? Otherwise it will be
trusted.afr.virt_images-client-{0,1,2} usually.

To overcome this scenario you can do what Ben Turner had suggested. Select
the source copy and change the xattrs manually.
I am suspecting that it has hit the arbiter becoming source for data heal
bug. But to confirm that we need the xattrs on the arbiter brick also.

Regards,
Karthik


On Thu, Dec 21, 2017 at 9:55 AM, Ben Turner <bturner at redhat.com> wrote:
> Here is the process for resolving split brain on replica 2:
>
> https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html/
> Administration_Guide/Recovering_from_File_Split-brain.html
>
> It should be pretty much the same for replica 3, you change the xattrs
> with something like:
>
> # setfattr -n trusted.afr.vol-client-0 -v 0x000000000000000100000000
> /gfs/brick-b/a
>
> When I try to decide which copy to use I normally run things like:
>
> # stat /<path to brick>/pat/to/file
>
> Check out the access and change times of the file on the back end bricks.
> I normally pick the copy with the latest access / change times.  I'll
also
> check:
>
> # md5sum /<path to brick>/pat/to/file
>
> Compare the hashes of the file on both bricks to see if the data actually
> differs.  If the data is the same it makes choosing the proper replica
> easier.
>
> Any idea how you got in this situation?  Did you have a loss of NW
> connectivity?  I see you are using server side quorum, maybe check the logs
> for any loss of quorum?  I wonder if there was a loos of quorum and there
> was some sort of race condition hit:
>
> http://docs.gluster.org/en/latest/Administrator%20Guide/
> arbiter-volumes-and-quorum/#server-quorum-and-some-pitfalls
>
> "Unlike in client-quorum where the volume becomes read-only when
quorum is
> lost, loss of server-quorum in a particular node makes glusterd kill the
> brick processes on that node (for the participating volumes) making even
> reads impossible."
>
> I wonder if the killing of brick processes could have led to some sort of
> race condition where writes were serviced on one brick / the arbiter and
> not the other?
>
> If you can find a reproducer for this please open a BZ with it, I have
> been seeing something similar(I think) but I haven't been able to run
the
> issue down yet.
>
> -b
>
> ----- Original Message -----
> > From: "Henrik Juul Pedersen" <hjp at liab.dk>
> > To: gluster-users at gluster.org
> > Cc: "Henrik Juul Pedersen" <henrik at corepower.dk>
> > Sent: Wednesday, December 20, 2017 1:26:37 PM
> > Subject: [Gluster-users] Gluster replicate 3 arbiter 1 in split brain.
>       gluster cli seems unaware
> >
> > Hi,
> >
> > I have the following volume:
> >
> > Volume Name: virt_images
> > Type: Replicate
> > Volume ID: 9f3c8273-4d9d-4af2-a4e7-4cb4a51e3594
> > Status: Started
> > Snapshot Count: 2
> > Number of Bricks: 1 x (2 + 1) = 3
> > Transport-type: tcp
> > Bricks:
> > Brick1: virt3:/data/virt_images/brick
> > Brick2: virt2:/data/virt_images/brick
> > Brick3: printserver:/data/virt_images/brick (arbiter)
> > Options Reconfigured:
> > features.quota-deem-statfs: on
> > features.inode-quota: on
> > features.quota: on
> > features.barrier: disable
> > features.scrub: Active
> > features.bitrot: on
> > nfs.rpc-auth-allow: on
> > server.allow-insecure: on
> > user.cifs: off
> > features.shard: off
> > cluster.shd-wait-qlength: 10000
> > cluster.locking-scheme: granular
> > cluster.data-self-heal-algorithm: full
> > cluster.server-quorum-type: server
> > cluster.quorum-type: auto
> > cluster.eager-lock: enable
> > network.remote-dio: enable
> > performance.low-prio-threads: 32
> > performance.io-cache: off
> > performance.read-ahead: off
> > performance.quick-read: off
> > nfs.disable: on
> > transport.address-family: inet
> > server.outstanding-rpc-limit: 512
> >
> > After a server reboot (brick 1) a single file has become unavailable:
> > # touch fedora27.qcow2
> > touch: setting times of 'fedora27.qcow2': Input/output error
> >
> > Looking at the split brain status from the client side cli:
> > # getfattr -n replica.split-brain-status fedora27.qcow2
> > # file: fedora27.qcow2
> > replica.split-brain-status="The file is not under data or
metadata
> > split-brain"
> >
> > However, in the client side log, a split brain is mentioned:
> > [2017-12-20 18:05:23.570762] E [MSGID: 108008]
> > [afr-transaction.c:2629:afr_write_txn_refresh_done]
> > 0-virt_images-replicate-0: Failing SETATTR on gfid
> > 7a36937d-52fc-4b55-a932-99e2328f02ba: split-brain observed.
> > [Input/output error]
> > [2017-12-20 18:05:23.576046] W [MSGID: 108027]
> > [afr-common.c:2733:afr_discover_done] 0-virt_images-replicate-0: no
> > read subvols for /fedora27.qcow2
> > [2017-12-20 18:05:23.578149] W [fuse-bridge.c:1153:fuse_setattr_cbk]
> > 0-glusterfs-fuse: 182: SETATTR() /fedora27.qcow2 => -1
(Input/output
> > error)
> >
> > = Server side
> >
> > No mention of a possible split brain:
> > # gluster volume heal virt_images info split-brain
> > Brick virt3:/data/virt_images/brick
> > Status: Connected
> > Number of entries in split-brain: 0
> >
> > Brick virt2:/data/virt_images/brick
> > Status: Connected
> > Number of entries in split-brain: 0
> >
> > Brick printserver:/data/virt_images/brick
> > Status: Connected
> > Number of entries in split-brain: 0
> >
> > The info command shows the file:
> > ]# gluster volume heal virt_images info
> > Brick virt3:/data/virt_images/brick
> > /fedora27.qcow2
> > Status: Connected
> > Number of entries: 1
> >
> > Brick virt2:/data/virt_images/brick
> > /fedora27.qcow2
> > Status: Connected
> > Number of entries: 1
> >
> > Brick printserver:/data/virt_images/brick
> > /fedora27.qcow2
> > Status: Connected
> > Number of entries: 1
> >
> >
> > The heal and heal full commands does nothing, and I can't find
> > anything in the logs about them trying and failing to fix the file.
> >
> > Trying to manually resolve the split brain from cli gives the
following:
> > # gluster volume heal virt_images split-brain source-brick
> > virt3:/data/virt_images/brick /fedora27.qcow2
> > Healing /fedora27.qcow2 failed: File not in split-brain.
> > Volume heal failed.
> >
> > The attrs from virt2 and virt3 are as follows:
> > [root at virt2 brick]# getfattr -d -m . -e hex fedora27.qcow2
> > # file: fedora27.qcow2
> > trusted.afr.dirty=0x000000000000000000000000
> > trusted.afr.virt_images-client-1=0x000002280000000000000000
> > trusted.afr.virt_images-client-3=0x000000000000000000000000
> > trusted.bit-rot.version=0x1d000000000000005a3aa0db000c6563
> > trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba
> > trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d
> 303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732
> >
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1>
0x00000000a49eb0000000000000000001
> > trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
> >
> > # file: fedora27.qcow2
> > trusted.afr.dirty=0x000000000000000000000000
> > trusted.afr.virt_images-client-2=0x000003ef0000000000000000
> > trusted.afr.virt_images-client-3=0x000000000000000000000000
> > trusted.bit-rot.version=0x19000000000000005a3a9f82000c382a
> > trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba
> > trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d
> 303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732
> >
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1>
0x00000000a2fbe0000000000000000001
> > trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
> >
> > I don't know how to find similar information from the arbiter...
> >
> > Versions are the same on all three systems:
> > # glusterd --version
> > glusterfs 3.12.2
> >
> > # gluster volume get all cluster.op-version
> > Option                                  Value
> > ------                                  -----
> > cluster.op-version                      31202
> >
> > I might try upgrading to version 3.13.0 tomorrow, but I want to hear
> > you out first.
> >
> > How do I fix this? Do I have to manually change the file attributes?
> >
> > Also, in the guides for manual resolution through setfattr, all the
> > bricks are listed with a
"trusted.afr.<volume>-client-<brick>". But in
> > my system (as can be seen above), I only see the other bricks? So
> > which attributes should be changes into what?
> >
> >
> >
> > I hope someone might know a solution. If you need any more information
> > I'll try and provide it. I can probably change the virtual machine
to
> > another image for now.
> >
> > Best regards,
> > Henrik Juul Pedersen
> > LIAB ApS
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-users
> >
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20171221/12856c17/attachment.html>

Henrik Juul Pedersen

2017-Dec-21 17:12 UTC

head link

[Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware

Hi Karthik and Ben,

I'll try and reply to you inline.

On 21 December 2017 at 07:18, Karthik Subrahmanya <ksubrahm at redhat.com>
wrote:> Hey,
>
> Can you give us the volume info output for this volume?
# gluster volume info virt_images

Volume Name: virt_images
Type: Replicate
Volume ID: 9f3c8273-4d9d-4af2-a4e7-4cb4a51e3594
Status: Started
Snapshot Count: 2
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: virt3:/data/virt_images/brick
Brick2: virt2:/data/virt_images/brick
Brick3: printserver:/data/virt_images/brick (arbiter)
Options Reconfigured:
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
features.barrier: disable
features.scrub: Active
features.bitrot: on
nfs.rpc-auth-allow: on
server.allow-insecure: on
user.cifs: off
features.shard: off
cluster.shd-wait-qlength: 10000
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
network.remote-dio: enable
performance.low-prio-threads: 32
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
nfs.disable: on
transport.address-family: inet
server.outstanding-rpc-limit: 512
> Why are you not able to get the xattrs from arbiter brick? It is the same
> way as you do it on data bricks.
Yes I must have confused myself yesterday somehow, here it is in full
from all three bricks:

Brick 1 (virt2): # getfattr -d -m . -e hex fedora27.qcow2
# file: fedora27.qcow2
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.virt_images-client-1=0x000002280000000000000000
trusted.afr.virt_images-client-3=0x000000000000000000000000
trusted.bit-rot.version=0x1d000000000000005a3aa0db000c6563
trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba
trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000000a49eb0000000000000000001
trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001

Brick 2 (virt3): # getfattr -d -m . -e hex fedora27.qcow2
# file: fedora27.qcow2
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.virt_images-client-2=0x000003ef0000000000000000
trusted.afr.virt_images-client-3=0x000000000000000000000000
trusted.bit-rot.version=0x19000000000000005a3a9f82000c382a
trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba
trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000000a2fbe0000000000000000001
trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001

Brick 3 - arbiter (printserver): # getfattr -d -m . -e hex fedora27.qcow2
# file: fedora27.qcow2
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.virt_images-client-1=0x000002280000000000000000
trusted.bit-rot.version=0x31000000000000005a39237200073206
trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba
trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000000000000000000000000000001
trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001

I was expecting trusted.afr.virt_images-client-{1,2,3} on all bricks?
> The changelog xattrs are named trusted.afr.virt_images-client-{1,2,3} in
the
> getxattr outputs you have provided.
> Did you do a remove-brick and add-brick any time? Otherwise it will be
> trusted.afr.virt_images-client-{0,1,2} usually.
Yes, the bricks was moved around initially; brick 0 was re-created as
brick 2, and the arbiter was added later on as well.
>
> To overcome this scenario you can do what Ben Turner had suggested. Select
> the source copy and change the xattrs manually.
I won't mind doing that, but again, the guides assume that I have
trusted.afr.virt_images-client-{1,2,3} on all bricks, so I'm not sure
what to change to what, where.
> I am suspecting that it has hit the arbiter becoming source for data heal
> bug. But to confirm that we need the xattrs on the arbiter brick also.
>
> Regards,
> Karthik
>
>
> On Thu, Dec 21, 2017 at 9:55 AM, Ben Turner <bturner at redhat.com>
wrote:
>>
>> Here is the process for resolving split brain on replica 2:
>>
>>
>>
https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html/Administration_Guide/Recovering_from_File_Split-brain.html
>>
>> It should be pretty much the same for replica 3, you change the xattrs
>> with something like:
>>
>> # setfattr -n trusted.afr.vol-client-0 -v 0x000000000000000100000000
>> /gfs/brick-b/a
>>
>> When I try to decide which copy to use I normally run things like:
>>
>> # stat /<path to brick>/pat/to/file
>>
>> Check out the access and change times of the file on the back end
bricks.
>> I normally pick the copy with the latest access / change times. 
I'll also
>> check:
>>
>> # md5sum /<path to brick>/pat/to/file
>>
>> Compare the hashes of the file on both bricks to see if the data
actually
>> differs.  If the data is the same it makes choosing the proper replica
>> easier.
The files on the bricks differ, so there was something changed, and
not replicated.

Thanks for the input, I've looked at that, but couldn't get it to fit,
as I dont have trusted.afr.virt_images-client-{1,2,3} on all bricks.
>>
>> Any idea how you got in this situation?  Did you have a loss of NW
>> connectivity?  I see you are using server side quorum, maybe check the
logs
>> for any loss of quorum?  I wonder if there was a loos of quorum and
there
>> was some sort of race condition hit:
>>
>>
>>
http://docs.gluster.org/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/#server-quorum-and-some-pitfalls
>>
>> "Unlike in client-quorum where the volume becomes read-only when
quorum is
>> lost, loss of server-quorum in a particular node makes glusterd kill
the
>> brick processes on that node (for the participating volumes) making
even
>> reads impossible."
I might have had a loss of server quorum, but I cant seem to see
exactly why or when from the logs:

Times are synchronized between servers. Virt 3 was rebooted for
service at 17:29:39. The shutdown logs show an issue with unmounting
the bricks, probably because glusterd was still running:
Dec 20 17:29:39 virt3 systemd[1]: Failed unmounting /data/virt_images.
Dec 20 17:29:39 virt3 systemd[1]: data-filserver.mount: Mount process
exited, code=exited status=32
Dec 20 17:29:39 virt3 systemd[1]: Failed unmounting /data/filserver.
Dec 20 17:29:39 virt3 systemd[1]: Unmounted /virt_images.
Dec 20 17:29:39 virt3 systemd[1]: Stopped target Network is Online.
Dec 20 17:29:39 virt3 systemd[1]: Stopping GlusterFS, a clustered
file-system server...
Dec 20 17:29:39 virt3 systemd[1]: Stopping Network Name Resolution...
Dec 20 17:29:39 virt3 systemd[1]: Stopped GlusterFS, a clustered
file-system server.

I believe it was around this time, the virtual machine (running on
virt2) was stopped by qemu.


Brick 1 (virt2) only experienced loss of quorum when starting gluster
(glusterd.log confirms this):
Dec 20 17:22:03 virt2 systemd[1]: Starting GlusterFS, a clustered
file-system server...
Dec 20 17:22:05 virt2 glusterd[739]: [2017-12-20 16:22:05.997472] C
[MSGID: 106002]
[glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
0-management: Server quorum lost for volume filserver. Stopping local
bricks.
Dec 20 17:22:05 virt2 glusterd[739]: [2017-12-20 16:22:05.997666] C
[MSGID: 106002]
[glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
0-management: Server quorum lost for volume virt_images. Stopping
local bricks.
Dec 20 17:22:06 virt2 systemd[1]: Started GlusterFS, a clustered
file-system server.
Dec 20 17:22:11 virt2 glusterd[739]: [2017-12-20 16:22:11.387238] C
[MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume filserver. Starting
local bricks.
Dec 20 17:22:11 virt2 glusterd[739]: [2017-12-20 16:22:11.390417] C
[MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume virt_images. Starting
local bricks.
-- Reboot --
Dec 20 18:41:35 virt2 systemd[1]: Starting GlusterFS, a clustered
file-system server...
Dec 20 18:41:41 virt2 systemd[1]: Started GlusterFS, a clustered
file-system server.
Dec 20 18:41:43 virt2 glusterd[748]: [2017-12-20 17:41:43.387633] C
[MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume filserver. Starting
local bricks.
Dec 20 18:41:43 virt2 glusterd[748]: [2017-12-20 17:41:43.391080] C
[MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume virt_images. Starting
local bricks.


Brick 2 (virt3) shows a network outage on the 19th, but everything
worked fine afterwards:
Dec 19 13:11:34 virt3 glusterd[10058]: [2017-12-19 12:11:34.382207] C
[MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume filserver. Starting
local bricks.
Dec 19 13:11:34 virt3 glusterd[10058]: [2017-12-19 12:11:34.387324] C
[MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume virt_images. Starting
local bricks.
Dec 20 17:29:39 virt3 systemd[1]: Stopping GlusterFS, a clustered
file-system server...
Dec 20 17:29:39 virt3 systemd[1]: Stopped GlusterFS, a clustered
file-system server.
-- Reboot --
Dec 20 17:30:21 virt3 systemd[1]: Starting GlusterFS, a clustered
file-system server...
Dec 20 17:30:22 virt3 glusterd[394]: [2017-12-20 16:30:22.826828] C
[MSGID: 106002]
[glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
0-management: Server quorum lost for volume filserver. Stopping local
bricks.
Dec 20 17:30:22 virt3 glusterd[394]: [2017-12-20 16:30:22.827188] C
[MSGID: 106002]
[glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
0-management: Server quorum lost for volume virt_images. Stopping
local bricks.
Dec 20 17:30:23 virt3 systemd[1]: Started GlusterFS, a clustered
file-system server.
Dec 20 17:30:29 virt3 glusterd[394]: [2017-12-20 16:30:29.488000] C
[MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume filserver. Starting
local bricks.
Dec 20 17:30:29 virt3 glusterd[394]: [2017-12-20 16:30:29.491446] C
[MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume virt_images. Starting
local bricks.
Dec 20 18:31:06 virt3 systemd[1]: Stopping GlusterFS, a clustered
file-system server...
Dec 20 18:31:06 virt3 systemd[1]: Stopped GlusterFS, a clustered
file-system server.
-- Reboot --
Dec 20 18:31:46 virt3 systemd[1]: Starting GlusterFS, a clustered
file-system server...
Dec 20 18:31:46 virt3 glusterd[386]: [2017-12-20 17:31:46.958818] C
[MSGID: 106002]
[glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
0-management: Server quorum lost for volume filserver. Stopping local
bricks.
Dec 20 18:31:46 virt3 glusterd[386]: [2017-12-20 17:31:46.959168] C
[MSGID: 106002]
[glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
0-management: Server quorum lost for volume virt_images. Stopping
local bricks.
Dec 20 18:31:47 virt3 systemd[1]: Started GlusterFS, a clustered
file-system server.
Dec 20 18:33:10 virt3 glusterd[386]: [2017-12-20 17:33:10.156180] C
[MSGID: 106001]
[glusterd-volume-ops.c:1534:glusterd_op_stage_start_volume]
0-management: Server quorum not met. Rejecting operation.
Dec 20 18:35:58 virt3 glusterd[386]: [2017-12-20 17:35:58.440395] C
[MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume filserver. Starting
local bricks.
Dec 20 18:35:58 virt3 glusterd[386]: [2017-12-20 17:35:58.446203] C
[MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume virt_images. Starting
local bricks.

Brick 3 - arbiter (printserver) shows no loss of quorum at that time
(again, glusterd.log confirms):
Dec 19 15:33:24 printserver systemd[1]: Starting GlusterFS, a
clustered file-system server...
Dec 19 15:33:26 printserver glusterd[306]: [2017-12-19
14:33:26.432369] C [MSGID: 106002]
[glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
0-management: Server quorum lost for volume filserver. Stopping local
bricks.
Dec 19 15:33:26 printserver glusterd[306]: [2017-12-19
14:33:26.432606] C [MSGID: 106002]
[glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
0-management: Server quorum lost for volume virt_images. Stopping
local bricks.
Dec 19 15:33:26 printserver systemd[1]: Started GlusterFS, a clustered
file-system server.
Dec 19 15:34:18 printserver glusterd[306]: [2017-12-19
14:34:18.158756] C [MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume filserver. Starting
local bricks.
Dec 19 15:34:18 printserver glusterd[306]: [2017-12-19
14:34:18.162242] C [MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume virt_images. Starting
local bricks.
Dec 20 18:28:52 printserver systemd[1]: Stopping GlusterFS, a
clustered file-system server...
Dec 20 18:28:52 printserver systemd[1]: Stopped GlusterFS, a clustered
file-system server.
-- Reboot --
Dec 20 18:30:40 printserver systemd[1]: Starting GlusterFS, a
clustered file-system server...
Dec 20 18:30:42 printserver glusterd[278]: [2017-12-20
17:30:42.441675] C [MSGID: 106002]
[glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
0-management: Server quorum lost for volume filserver. Stopping local
bricks.
Dec 20 18:30:42 printserver glusterd[278]: [2017-12-20
17:30:42.441929] C [MSGID: 106002]
[glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
0-management: Server quorum lost for volume virt_images. Stopping
local bricks.
Dec 20 18:30:42 printserver systemd[1]: Started GlusterFS, a clustered
file-system server.
Dec 20 18:33:49 printserver glusterd[278]: [2017-12-20
17:33:49.005534] C [MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume filserver. Starting
local bricks.
Dec 20 18:33:49 printserver glusterd[278]: [2017-12-20
17:33:49.008010] C [MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume virt_images. Starting
local bricks.
>>
>> I wonder if the killing of brick processes could have led to some sort
of
>> race condition where writes were serviced on one brick / the arbiter
and not
>> the other?
>>
>> If you can find a reproducer for this please open a BZ with it, I have
>> been seeing something similar(I think) but I haven't been able to
run the
>> issue down yet.
>>
>> -b
I'm not sure if I can replicate this, a lot has been going on in my
setup the past few days (trying to tune some horrible small-file and
file creation/deletion performance).

Thanks for looking into this with me.

Best regards,
Henrik Juul Pedersen
LIAB ApS

Karthik Subrahmanya

2017-Dec-22 06:26 UTC

head link

[Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware

Hi Henrik,

Thanks for providing the required outputs. See my replies inline.

On Thu, Dec 21, 2017 at 10:42 PM, Henrik Juul Pedersen <hjp at liab.dk>
wrote:
> Hi Karthik and Ben,
>
> I'll try and reply to you inline.
>
> On 21 December 2017 at 07:18, Karthik Subrahmanya <ksubrahm at
redhat.com>
> wrote:
> > Hey,
> >
> > Can you give us the volume info output for this volume?
>
> # gluster volume info virt_images
>
> Volume Name: virt_images
> Type: Replicate
> Volume ID: 9f3c8273-4d9d-4af2-a4e7-4cb4a51e3594
> Status: Started
> Snapshot Count: 2
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: virt3:/data/virt_images/brick
> Brick2: virt2:/data/virt_images/brick
> Brick3: printserver:/data/virt_images/brick (arbiter)
> Options Reconfigured:
> features.quota-deem-statfs: on
> features.inode-quota: on
> features.quota: on
> features.barrier: disable
> features.scrub: Active
> features.bitrot: on
> nfs.rpc-auth-allow: on
> server.allow-insecure: on
> user.cifs: off
> features.shard: off
> cluster.shd-wait-qlength: 10000
> cluster.locking-scheme: granular
> cluster.data-self-heal-algorithm: full
> cluster.server-quorum-type: server
> cluster.quorum-type: auto
> cluster.eager-lock: enable
> network.remote-dio: enable
> performance.low-prio-threads: 32
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> nfs.disable: on
> transport.address-family: inet
> server.outstanding-rpc-limit: 512
>
> > Why are you not able to get the xattrs from arbiter brick? It is the
same
> > way as you do it on data bricks.
>
> Yes I must have confused myself yesterday somehow, here it is in full
> from all three bricks:
>
> Brick 1 (virt2): # getfattr -d -m . -e hex fedora27.qcow2
> # file: fedora27.qcow2
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.virt_images-client-1=0x000002280000000000000000
> trusted.afr.virt_images-client-3=0x000000000000000000000000
> trusted.bit-rot.version=0x1d000000000000005a3aa0db000c6563
> trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba
> trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d
> 303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732
> trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1>
0x00000000a49eb0000000000000000001
> trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
>
> Brick 2 (virt3): # getfattr -d -m . -e hex fedora27.qcow2
> # file: fedora27.qcow2
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.virt_images-client-2=0x000003ef0000000000000000
> trusted.afr.virt_images-client-3=0x000000000000000000000000
> trusted.bit-rot.version=0x19000000000000005a3a9f82000c382a
> trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba
> trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d
> 303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732
> trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1>
0x00000000a2fbe0000000000000000001
> trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
>
> Brick 3 - arbiter (printserver): # getfattr -d -m . -e hex fedora27.qcow2
> # file: fedora27.qcow2
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.virt_images-client-1=0x000002280000000000000000
> trusted.bit-rot.version=0x31000000000000005a39237200073206
> trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba
> trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d
> 303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732
> trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1>
0x00000000000000000000000000000001
> trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
>
> I was expecting trusted.afr.virt_images-client-{1,2,3} on all bricks?
>
>From AFR-V2 we do not have  self blaming attrs. So you will see a brickblaming other bricks only.
For example brcik1 can blame brick2 & brick 3, not itself.
>
> > The changelog xattrs are named trusted.afr.virt_images-client-{1,2,3}
> in the
> > getxattr outputs you have provided.
> > Did you do a remove-brick and add-brick any time? Otherwise it will be
> > trusted.afr.virt_images-client-{0,1,2} usually.
>
> Yes, the bricks was moved around initially; brick 0 was re-created as
> brick 2, and the arbiter was added later on as well.
>
> >
> > To overcome this scenario you can do what Ben Turner had suggested.
> Select
> > the source copy and change the xattrs manually.
>
> I won't mind doing that, but again, the guides assume that I have
> trusted.afr.virt_images-client-{1,2,3} on all bricks, so I'm not sure
> what to change to what, where.
> > I am suspecting that it has hit the arbiter becoming source for data
heal
> > bug. But to confirm that we need the xattrs on the arbiter brick also.
> >
> > Regards,
> > Karthik
> >
> >
> > On Thu, Dec 21, 2017 at 9:55 AM, Ben Turner <bturner at
redhat.com> wrote:
> >>
> >> Here is the process for resolving split brain on replica 2:
> >>
> >>
> >>
https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html/
> Administration_Guide/Recovering_from_File_Split-brain.html
> >>
> >> It should be pretty much the same for replica 3, you change the
xattrs
> >> with something like:
> >>
> >> # setfattr -n trusted.afr.vol-client-0 -v
0x000000000000000100000000
> >> /gfs/brick-b/a
> >>
> >> When I try to decide which copy to use I normally run things like:
> >>
> >> # stat /<path to brick>/pat/to/file
> >>
> >> Check out the access and change times of the file on the back end
> bricks.
> >> I normally pick the copy with the latest access / change times. 
I'll
> also
> >> check:
> >>
> >> # md5sum /<path to brick>/pat/to/file
> >>
> >> Compare the hashes of the file on both bricks to see if the data
> actually
> >> differs.  If the data is the same it makes choosing the proper
replica
> >> easier.
>
> The files on the bricks differ, so there was something changed, and
> not replicated.
>
> Thanks for the input, I've looked at that, but couldn't get it to
fit,
> as I dont have trusted.afr.virt_images-client-{1,2,3} on all bricks.
>You can choose any one of the copy as good based on the latest ctime/mtime.
Before doing anything keep the backup of both the copies, so that if
something bad happens,
you will have the data safe.
Now choose one copy as good (based on timestamps/size/choosing a brick as
source),
and reset the xattrs set for that on other brick. Then do lookup on that
file from the mount.
That should resolve the issue.
Once you are done, please let us know the result.

Regards,
Karthik
>
> >>
> >> Any idea how you got in this situation?  Did you have a loss of NW
> >> connectivity?  I see you are using server side quorum, maybe check
the
> logs
> >> for any loss of quorum?  I wonder if there was a loos of quorum
and
> there
> >> was some sort of race condition hit:
> >>
> >>
> >> http://docs.gluster.org/en/latest/Administrator%20Guide/
> arbiter-volumes-and-quorum/#server-quorum-and-some-pitfalls
> >>
> >> "Unlike in client-quorum where the volume becomes read-only
when quorum
> is
> >> lost, loss of server-quorum in a particular node makes glusterd
kill the
> >> brick processes on that node (for the participating volumes)
making even
> >> reads impossible."
>
> I might have had a loss of server quorum, but I cant seem to see
> exactly why or when from the logs:
>
> Times are synchronized between servers. Virt 3 was rebooted for
> service at 17:29:39. The shutdown logs show an issue with unmounting
> the bricks, probably because glusterd was still running:
> Dec 20 17:29:39 virt3 systemd[1]: Failed unmounting /data/virt_images.
> Dec 20 17:29:39 virt3 systemd[1]: data-filserver.mount: Mount process
> exited, code=exited status=32
> Dec 20 17:29:39 virt3 systemd[1]: Failed unmounting /data/filserver.
> Dec 20 17:29:39 virt3 systemd[1]: Unmounted /virt_images.
> Dec 20 17:29:39 virt3 systemd[1]: Stopped target Network is Online.
> Dec 20 17:29:39 virt3 systemd[1]: Stopping GlusterFS, a clustered
> file-system server...
> Dec 20 17:29:39 virt3 systemd[1]: Stopping Network Name Resolution...
> Dec 20 17:29:39 virt3 systemd[1]: Stopped GlusterFS, a clustered
> file-system server.
>
> I believe it was around this time, the virtual machine (running on
> virt2) was stopped by qemu.
>
>
> Brick 1 (virt2) only experienced loss of quorum when starting gluster
> (glusterd.log confirms this):
> Dec 20 17:22:03 virt2 systemd[1]: Starting GlusterFS, a clustered
> file-system server...
> Dec 20 17:22:05 virt2 glusterd[739]: [2017-12-20 16:22:05.997472] C
> [MSGID: 106002]
> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> 0-management: Server quorum lost for volume filserver. Stopping local
> bricks.
> Dec 20 17:22:05 virt2 glusterd[739]: [2017-12-20 16:22:05.997666] C
> [MSGID: 106002]
> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> 0-management: Server quorum lost for volume virt_images. Stopping
> local bricks.
> Dec 20 17:22:06 virt2 systemd[1]: Started GlusterFS, a clustered
> file-system server.
> Dec 20 17:22:11 virt2 glusterd[739]: [2017-12-20 16:22:11.387238] C
> [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume filserver. Starting
> local bricks.
> Dec 20 17:22:11 virt2 glusterd[739]: [2017-12-20 16:22:11.390417] C
> [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume virt_images. Starting
> local bricks.
> -- Reboot --
> Dec 20 18:41:35 virt2 systemd[1]: Starting GlusterFS, a clustered
> file-system server...
> Dec 20 18:41:41 virt2 systemd[1]: Started GlusterFS, a clustered
> file-system server.
> Dec 20 18:41:43 virt2 glusterd[748]: [2017-12-20 17:41:43.387633] C
> [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume filserver. Starting
> local bricks.
> Dec 20 18:41:43 virt2 glusterd[748]: [2017-12-20 17:41:43.391080] C
> [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume virt_images. Starting
> local bricks.
>
>
> Brick 2 (virt3) shows a network outage on the 19th, but everything
> worked fine afterwards:
> Dec 19 13:11:34 virt3 glusterd[10058]: [2017-12-19 12:11:34.382207] C
> [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume filserver. Starting
> local bricks.
> Dec 19 13:11:34 virt3 glusterd[10058]: [2017-12-19 12:11:34.387324] C
> [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume virt_images. Starting
> local bricks.
> Dec 20 17:29:39 virt3 systemd[1]: Stopping GlusterFS, a clustered
> file-system server...
> Dec 20 17:29:39 virt3 systemd[1]: Stopped GlusterFS, a clustered
> file-system server.
> -- Reboot --
> Dec 20 17:30:21 virt3 systemd[1]: Starting GlusterFS, a clustered
> file-system server...
> Dec 20 17:30:22 virt3 glusterd[394]: [2017-12-20 16:30:22.826828] C
> [MSGID: 106002]
> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> 0-management: Server quorum lost for volume filserver. Stopping local
> bricks.
> Dec 20 17:30:22 virt3 glusterd[394]: [2017-12-20 16:30:22.827188] C
> [MSGID: 106002]
> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> 0-management: Server quorum lost for volume virt_images. Stopping
> local bricks.
> Dec 20 17:30:23 virt3 systemd[1]: Started GlusterFS, a clustered
> file-system server.
> Dec 20 17:30:29 virt3 glusterd[394]: [2017-12-20 16:30:29.488000] C
> [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume filserver. Starting
> local bricks.
> Dec 20 17:30:29 virt3 glusterd[394]: [2017-12-20 16:30:29.491446] C
> [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume virt_images. Starting
> local bricks.
> Dec 20 18:31:06 virt3 systemd[1]: Stopping GlusterFS, a clustered
> file-system server...
> Dec 20 18:31:06 virt3 systemd[1]: Stopped GlusterFS, a clustered
> file-system server.
> -- Reboot --
> Dec 20 18:31:46 virt3 systemd[1]: Starting GlusterFS, a clustered
> file-system server...
> Dec 20 18:31:46 virt3 glusterd[386]: [2017-12-20 17:31:46.958818] C
> [MSGID: 106002]
> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> 0-management: Server quorum lost for volume filserver. Stopping local
> bricks.
> Dec 20 18:31:46 virt3 glusterd[386]: [2017-12-20 17:31:46.959168] C
> [MSGID: 106002]
> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> 0-management: Server quorum lost for volume virt_images. Stopping
> local bricks.
> Dec 20 18:31:47 virt3 systemd[1]: Started GlusterFS, a clustered
> file-system server.
> Dec 20 18:33:10 virt3 glusterd[386]: [2017-12-20 17:33:10.156180] C
> [MSGID: 106001]
> [glusterd-volume-ops.c:1534:glusterd_op_stage_start_volume]
> 0-management: Server quorum not met. Rejecting operation.
> Dec 20 18:35:58 virt3 glusterd[386]: [2017-12-20 17:35:58.440395] C
> [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume filserver. Starting
> local bricks.
> Dec 20 18:35:58 virt3 glusterd[386]: [2017-12-20 17:35:58.446203] C
> [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume virt_images. Starting
> local bricks.
>
> Brick 3 - arbiter (printserver) shows no loss of quorum at that time
> (again, glusterd.log confirms):
> Dec 19 15:33:24 printserver systemd[1]: Starting GlusterFS, a
> clustered file-system server...
> Dec 19 15:33:26 printserver glusterd[306]: [2017-12-19
> 14:33:26.432369] C [MSGID: 106002]
> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> 0-management: Server quorum lost for volume filserver. Stopping local
> bricks.
> Dec 19 15:33:26 printserver glusterd[306]: [2017-12-19
> 14:33:26.432606] C [MSGID: 106002]
> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> 0-management: Server quorum lost for volume virt_images. Stopping
> local bricks.
> Dec 19 15:33:26 printserver systemd[1]: Started GlusterFS, a clustered
> file-system server.
> Dec 19 15:34:18 printserver glusterd[306]: [2017-12-19
> 14:34:18.158756] C [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume filserver. Starting
> local bricks.
> Dec 19 15:34:18 printserver glusterd[306]: [2017-12-19
> 14:34:18.162242] C [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume virt_images. Starting
> local bricks.
> Dec 20 18:28:52 printserver systemd[1]: Stopping GlusterFS, a
> clustered file-system server...
> Dec 20 18:28:52 printserver systemd[1]: Stopped GlusterFS, a clustered
> file-system server.
> -- Reboot --
> Dec 20 18:30:40 printserver systemd[1]: Starting GlusterFS, a
> clustered file-system server...
> Dec 20 18:30:42 printserver glusterd[278]: [2017-12-20
> 17:30:42.441675] C [MSGID: 106002]
> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> 0-management: Server quorum lost for volume filserver. Stopping local
> bricks.
> Dec 20 18:30:42 printserver glusterd[278]: [2017-12-20
> 17:30:42.441929] C [MSGID: 106002]
> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> 0-management: Server quorum lost for volume virt_images. Stopping
> local bricks.
> Dec 20 18:30:42 printserver systemd[1]: Started GlusterFS, a clustered
> file-system server.
> Dec 20 18:33:49 printserver glusterd[278]: [2017-12-20
> 17:33:49.005534] C [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume filserver. Starting
> local bricks.
> Dec 20 18:33:49 printserver glusterd[278]: [2017-12-20
> 17:33:49.008010] C [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume virt_images. Starting
> local bricks.
>
> >>
> >> I wonder if the killing of brick processes could have led to some
sort
> of
> >> race condition where writes were serviced on one brick / the
arbiter
> and not
> >> the other?
> >>
> >> If you can find a reproducer for this please open a BZ with it, I
have
> >> been seeing something similar(I think) but I haven't been able
to run
> the
> >> issue down yet.
> >>
> >> -b
>
> I'm not sure if I can replicate this, a lot has been going on in my
> setup the past few days (trying to tune some horrible small-file and
> file creation/deletion performance).
>
> Thanks for looking into this with me.
>
> Best regards,
> Henrik Juul Pedersen
> LIAB ApS
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20171222/e6dbb5bd/attachment.html>

Possibly Parallel Threads

Search for more maybe matching threads

Gluster users - Dec 2017 - Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware

[Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware

[Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware

[Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware

Possibly Parallel Threads