thr3ads.net - Gluster users - [Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware [Dec 2017]

If this information is useful, please help other people find it:
Share via:

Karthik Subrahmanya

2017-Dec-22 06:26 UTC

[Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware

Hi Henrik,

Thanks for providing the required outputs. See my replies inline.

On Thu, Dec 21, 2017 at 10:42 PM, Henrik Juul Pedersen <hjp at liab.dk>
wrote:
> Hi Karthik and Ben,
>
> I'll try and reply to you inline.
>
> On 21 December 2017 at 07:18, Karthik Subrahmanya <ksubrahm at
redhat.com>
> wrote:
> > Hey,
> >
> > Can you give us the volume info output for this volume?
>
> # gluster volume info virt_images
>
> Volume Name: virt_images
> Type: Replicate
> Volume ID: 9f3c8273-4d9d-4af2-a4e7-4cb4a51e3594
> Status: Started
> Snapshot Count: 2
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: virt3:/data/virt_images/brick
> Brick2: virt2:/data/virt_images/brick
> Brick3: printserver:/data/virt_images/brick (arbiter)
> Options Reconfigured:
> features.quota-deem-statfs: on
> features.inode-quota: on
> features.quota: on
> features.barrier: disable
> features.scrub: Active
> features.bitrot: on
> nfs.rpc-auth-allow: on
> server.allow-insecure: on
> user.cifs: off
> features.shard: off
> cluster.shd-wait-qlength: 10000
> cluster.locking-scheme: granular
> cluster.data-self-heal-algorithm: full
> cluster.server-quorum-type: server
> cluster.quorum-type: auto
> cluster.eager-lock: enable
> network.remote-dio: enable
> performance.low-prio-threads: 32
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> nfs.disable: on
> transport.address-family: inet
> server.outstanding-rpc-limit: 512
>
> > Why are you not able to get the xattrs from arbiter brick? It is the
same
> > way as you do it on data bricks.
>
> Yes I must have confused myself yesterday somehow, here it is in full
> from all three bricks:
>
> Brick 1 (virt2): # getfattr -d -m . -e hex fedora27.qcow2
> # file: fedora27.qcow2
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.virt_images-client-1=0x000002280000000000000000
> trusted.afr.virt_images-client-3=0x000000000000000000000000
> trusted.bit-rot.version=0x1d000000000000005a3aa0db000c6563
> trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba
> trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d
> 303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732
> trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1>
0x00000000a49eb0000000000000000001
> trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
>
> Brick 2 (virt3): # getfattr -d -m . -e hex fedora27.qcow2
> # file: fedora27.qcow2
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.virt_images-client-2=0x000003ef0000000000000000
> trusted.afr.virt_images-client-3=0x000000000000000000000000
> trusted.bit-rot.version=0x19000000000000005a3a9f82000c382a
> trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba
> trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d
> 303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732
> trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1>
0x00000000a2fbe0000000000000000001
> trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
>
> Brick 3 - arbiter (printserver): # getfattr -d -m . -e hex fedora27.qcow2
> # file: fedora27.qcow2
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.virt_images-client-1=0x000002280000000000000000
> trusted.bit-rot.version=0x31000000000000005a39237200073206
> trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba
> trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d
> 303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732
> trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1>
0x00000000000000000000000000000001
> trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
>
> I was expecting trusted.afr.virt_images-client-{1,2,3} on all bricks?
>
>From AFR-V2 we do not have  self blaming attrs. So you will see a brickblaming other bricks only.
For example brcik1 can blame brick2 & brick 3, not itself.
>
> > The changelog xattrs are named trusted.afr.virt_images-client-{1,2,3}
> in the
> > getxattr outputs you have provided.
> > Did you do a remove-brick and add-brick any time? Otherwise it will be
> > trusted.afr.virt_images-client-{0,1,2} usually.
>
> Yes, the bricks was moved around initially; brick 0 was re-created as
> brick 2, and the arbiter was added later on as well.
>
> >
> > To overcome this scenario you can do what Ben Turner had suggested.
> Select
> > the source copy and change the xattrs manually.
>
> I won't mind doing that, but again, the guides assume that I have
> trusted.afr.virt_images-client-{1,2,3} on all bricks, so I'm not sure
> what to change to what, where.
> > I am suspecting that it has hit the arbiter becoming source for data
heal
> > bug. But to confirm that we need the xattrs on the arbiter brick also.
> >
> > Regards,
> > Karthik
> >
> >
> > On Thu, Dec 21, 2017 at 9:55 AM, Ben Turner <bturner at
redhat.com> wrote:
> >>
> >> Here is the process for resolving split brain on replica 2:
> >>
> >>
> >>
https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html/
> Administration_Guide/Recovering_from_File_Split-brain.html
> >>
> >> It should be pretty much the same for replica 3, you change the
xattrs
> >> with something like:
> >>
> >> # setfattr -n trusted.afr.vol-client-0 -v
0x000000000000000100000000
> >> /gfs/brick-b/a
> >>
> >> When I try to decide which copy to use I normally run things like:
> >>
> >> # stat /<path to brick>/pat/to/file
> >>
> >> Check out the access and change times of the file on the back end
> bricks.
> >> I normally pick the copy with the latest access / change times. 
I'll
> also
> >> check:
> >>
> >> # md5sum /<path to brick>/pat/to/file
> >>
> >> Compare the hashes of the file on both bricks to see if the data
> actually
> >> differs.  If the data is the same it makes choosing the proper
replica
> >> easier.
>
> The files on the bricks differ, so there was something changed, and
> not replicated.
>
> Thanks for the input, I've looked at that, but couldn't get it to
fit,
> as I dont have trusted.afr.virt_images-client-{1,2,3} on all bricks.
>You can choose any one of the copy as good based on the latest ctime/mtime.
Before doing anything keep the backup of both the copies, so that if
something bad happens,
you will have the data safe.
Now choose one copy as good (based on timestamps/size/choosing a brick as
source),
and reset the xattrs set for that on other brick. Then do lookup on that
file from the mount.
That should resolve the issue.
Once you are done, please let us know the result.

Regards,
Karthik
>
> >>
> >> Any idea how you got in this situation?  Did you have a loss of NW
> >> connectivity?  I see you are using server side quorum, maybe check
the
> logs
> >> for any loss of quorum?  I wonder if there was a loos of quorum
and
> there
> >> was some sort of race condition hit:
> >>
> >>
> >> http://docs.gluster.org/en/latest/Administrator%20Guide/
> arbiter-volumes-and-quorum/#server-quorum-and-some-pitfalls
> >>
> >> "Unlike in client-quorum where the volume becomes read-only
when quorum
> is
> >> lost, loss of server-quorum in a particular node makes glusterd
kill the
> >> brick processes on that node (for the participating volumes)
making even
> >> reads impossible."
>
> I might have had a loss of server quorum, but I cant seem to see
> exactly why or when from the logs:
>
> Times are synchronized between servers. Virt 3 was rebooted for
> service at 17:29:39. The shutdown logs show an issue with unmounting
> the bricks, probably because glusterd was still running:
> Dec 20 17:29:39 virt3 systemd[1]: Failed unmounting /data/virt_images.
> Dec 20 17:29:39 virt3 systemd[1]: data-filserver.mount: Mount process
> exited, code=exited status=32
> Dec 20 17:29:39 virt3 systemd[1]: Failed unmounting /data/filserver.
> Dec 20 17:29:39 virt3 systemd[1]: Unmounted /virt_images.
> Dec 20 17:29:39 virt3 systemd[1]: Stopped target Network is Online.
> Dec 20 17:29:39 virt3 systemd[1]: Stopping GlusterFS, a clustered
> file-system server...
> Dec 20 17:29:39 virt3 systemd[1]: Stopping Network Name Resolution...
> Dec 20 17:29:39 virt3 systemd[1]: Stopped GlusterFS, a clustered
> file-system server.
>
> I believe it was around this time, the virtual machine (running on
> virt2) was stopped by qemu.
>
>
> Brick 1 (virt2) only experienced loss of quorum when starting gluster
> (glusterd.log confirms this):
> Dec 20 17:22:03 virt2 systemd[1]: Starting GlusterFS, a clustered
> file-system server...
> Dec 20 17:22:05 virt2 glusterd[739]: [2017-12-20 16:22:05.997472] C
> [MSGID: 106002]
> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> 0-management: Server quorum lost for volume filserver. Stopping local
> bricks.
> Dec 20 17:22:05 virt2 glusterd[739]: [2017-12-20 16:22:05.997666] C
> [MSGID: 106002]
> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> 0-management: Server quorum lost for volume virt_images. Stopping
> local bricks.
> Dec 20 17:22:06 virt2 systemd[1]: Started GlusterFS, a clustered
> file-system server.
> Dec 20 17:22:11 virt2 glusterd[739]: [2017-12-20 16:22:11.387238] C
> [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume filserver. Starting
> local bricks.
> Dec 20 17:22:11 virt2 glusterd[739]: [2017-12-20 16:22:11.390417] C
> [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume virt_images. Starting
> local bricks.
> -- Reboot --
> Dec 20 18:41:35 virt2 systemd[1]: Starting GlusterFS, a clustered
> file-system server...
> Dec 20 18:41:41 virt2 systemd[1]: Started GlusterFS, a clustered
> file-system server.
> Dec 20 18:41:43 virt2 glusterd[748]: [2017-12-20 17:41:43.387633] C
> [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume filserver. Starting
> local bricks.
> Dec 20 18:41:43 virt2 glusterd[748]: [2017-12-20 17:41:43.391080] C
> [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume virt_images. Starting
> local bricks.
>
>
> Brick 2 (virt3) shows a network outage on the 19th, but everything
> worked fine afterwards:
> Dec 19 13:11:34 virt3 glusterd[10058]: [2017-12-19 12:11:34.382207] C
> [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume filserver. Starting
> local bricks.
> Dec 19 13:11:34 virt3 glusterd[10058]: [2017-12-19 12:11:34.387324] C
> [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume virt_images. Starting
> local bricks.
> Dec 20 17:29:39 virt3 systemd[1]: Stopping GlusterFS, a clustered
> file-system server...
> Dec 20 17:29:39 virt3 systemd[1]: Stopped GlusterFS, a clustered
> file-system server.
> -- Reboot --
> Dec 20 17:30:21 virt3 systemd[1]: Starting GlusterFS, a clustered
> file-system server...
> Dec 20 17:30:22 virt3 glusterd[394]: [2017-12-20 16:30:22.826828] C
> [MSGID: 106002]
> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> 0-management: Server quorum lost for volume filserver. Stopping local
> bricks.
> Dec 20 17:30:22 virt3 glusterd[394]: [2017-12-20 16:30:22.827188] C
> [MSGID: 106002]
> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> 0-management: Server quorum lost for volume virt_images. Stopping
> local bricks.
> Dec 20 17:30:23 virt3 systemd[1]: Started GlusterFS, a clustered
> file-system server.
> Dec 20 17:30:29 virt3 glusterd[394]: [2017-12-20 16:30:29.488000] C
> [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume filserver. Starting
> local bricks.
> Dec 20 17:30:29 virt3 glusterd[394]: [2017-12-20 16:30:29.491446] C
> [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume virt_images. Starting
> local bricks.
> Dec 20 18:31:06 virt3 systemd[1]: Stopping GlusterFS, a clustered
> file-system server...
> Dec 20 18:31:06 virt3 systemd[1]: Stopped GlusterFS, a clustered
> file-system server.
> -- Reboot --
> Dec 20 18:31:46 virt3 systemd[1]: Starting GlusterFS, a clustered
> file-system server...
> Dec 20 18:31:46 virt3 glusterd[386]: [2017-12-20 17:31:46.958818] C
> [MSGID: 106002]
> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> 0-management: Server quorum lost for volume filserver. Stopping local
> bricks.
> Dec 20 18:31:46 virt3 glusterd[386]: [2017-12-20 17:31:46.959168] C
> [MSGID: 106002]
> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> 0-management: Server quorum lost for volume virt_images. Stopping
> local bricks.
> Dec 20 18:31:47 virt3 systemd[1]: Started GlusterFS, a clustered
> file-system server.
> Dec 20 18:33:10 virt3 glusterd[386]: [2017-12-20 17:33:10.156180] C
> [MSGID: 106001]
> [glusterd-volume-ops.c:1534:glusterd_op_stage_start_volume]
> 0-management: Server quorum not met. Rejecting operation.
> Dec 20 18:35:58 virt3 glusterd[386]: [2017-12-20 17:35:58.440395] C
> [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume filserver. Starting
> local bricks.
> Dec 20 18:35:58 virt3 glusterd[386]: [2017-12-20 17:35:58.446203] C
> [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume virt_images. Starting
> local bricks.
>
> Brick 3 - arbiter (printserver) shows no loss of quorum at that time
> (again, glusterd.log confirms):
> Dec 19 15:33:24 printserver systemd[1]: Starting GlusterFS, a
> clustered file-system server...
> Dec 19 15:33:26 printserver glusterd[306]: [2017-12-19
> 14:33:26.432369] C [MSGID: 106002]
> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> 0-management: Server quorum lost for volume filserver. Stopping local
> bricks.
> Dec 19 15:33:26 printserver glusterd[306]: [2017-12-19
> 14:33:26.432606] C [MSGID: 106002]
> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> 0-management: Server quorum lost for volume virt_images. Stopping
> local bricks.
> Dec 19 15:33:26 printserver systemd[1]: Started GlusterFS, a clustered
> file-system server.
> Dec 19 15:34:18 printserver glusterd[306]: [2017-12-19
> 14:34:18.158756] C [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume filserver. Starting
> local bricks.
> Dec 19 15:34:18 printserver glusterd[306]: [2017-12-19
> 14:34:18.162242] C [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume virt_images. Starting
> local bricks.
> Dec 20 18:28:52 printserver systemd[1]: Stopping GlusterFS, a
> clustered file-system server...
> Dec 20 18:28:52 printserver systemd[1]: Stopped GlusterFS, a clustered
> file-system server.
> -- Reboot --
> Dec 20 18:30:40 printserver systemd[1]: Starting GlusterFS, a
> clustered file-system server...
> Dec 20 18:30:42 printserver glusterd[278]: [2017-12-20
> 17:30:42.441675] C [MSGID: 106002]
> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> 0-management: Server quorum lost for volume filserver. Stopping local
> bricks.
> Dec 20 18:30:42 printserver glusterd[278]: [2017-12-20
> 17:30:42.441929] C [MSGID: 106002]
> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> 0-management: Server quorum lost for volume virt_images. Stopping
> local bricks.
> Dec 20 18:30:42 printserver systemd[1]: Started GlusterFS, a clustered
> file-system server.
> Dec 20 18:33:49 printserver glusterd[278]: [2017-12-20
> 17:33:49.005534] C [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume filserver. Starting
> local bricks.
> Dec 20 18:33:49 printserver glusterd[278]: [2017-12-20
> 17:33:49.008010] C [MSGID: 106003]
> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume virt_images. Starting
> local bricks.
>
> >>
> >> I wonder if the killing of brick processes could have led to some
sort
> of
> >> race condition where writes were serviced on one brick / the
arbiter
> and not
> >> the other?
> >>
> >> If you can find a reproducer for this please open a BZ with it, I
have
> >> been seeing something similar(I think) but I haven't been able
to run
> the
> >> issue down yet.
> >>
> >> -b
>
> I'm not sure if I can replicate this, a lot has been going on in my
> setup the past few days (trying to tune some horrible small-file and
> file creation/deletion performance).
>
> Thanks for looking into this with me.
>
> Best regards,
> Henrik Juul Pedersen
> LIAB ApS
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20171222/e6dbb5bd/attachment.html>

Henrik Juul Pedersen

2017-Dec-22 12:31 UTC

head link

[Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware

Hi Karthik,

Thanks for the info. Maybe the documentation should be updated to
explain the different AFR versions, I know I was confused.

Also, when looking at the changelogs from my three bricks before fixing:

Brick 1:
trusted.afr.virt_images-client-1=0x000002280000000000000000
trusted.afr.virt_images-client-3=0x000000000000000000000000

Brick 2:
trusted.afr.virt_images-client-2=0x000003ef0000000000000000
trusted.afr.virt_images-client-3=0x000000000000000000000000

Brick 3 (arbiter):
trusted.afr.virt_images-client-1=0x000002280000000000000000

I would think that the changelog for client 1 should win by majority
vote? Or how does the self-healing process work?
I assumed this as the correct version, and reset client 2 on brick 2:
# setfattr -n trusted.afr.virt_images-client-2 -v
0x000000000000000000000000 fedora27.qcow2

I then did a directory listing, which might have started a heal, but
heal statistics show (i also did a full heal):
Starting time of crawl: Fri Dec 22 11:34:47 2017

Ending time of crawl: Fri Dec 22 11:34:47 2017

Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 1

Starting time of crawl: Fri Dec 22 11:39:29 2017

Ending time of crawl: Fri Dec 22 11:39:29 2017

Type of crawl: FULL
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 1

I was immediately able to touch the file, so gluster was okay about
it, however heal info still showed the file for a while:
# gluster volume heal virt_images info
Brick virt3:/data/virt_images/brick
/fedora27.qcow2
Status: Connected
Number of entries: 1

Brick virt2:/data/virt_images/brick
/fedora27.qcow2
Status: Connected
Number of entries: 1

Brick printserver:/data/virt_images/brick
/fedora27.qcow2
Status: Connected
Number of entries: 1



Now heal info shows 0 entries, and the two data bricks have the same
md5sum, so it's back in sync.



I have a few questions after all of this:

1) How can a split brain happen in a replica 3 arbiter 1 setup with
both server- and client quorum enabled?
2) Why was it not able to self heal, when tro bricks seemed in sync
with their changelogs?
3) Why could I not see the file in heal info split-brain?
4) Why could I not fix this through the cli split-brain resolution tool?
5) Is it possible to force a sync in a volume? Or maybe test sync
status? It might be smart to be able to "flush" changes when taking a
brick down for maintenance.
6) How am I supposed to monitor events like this? I have a gluster
volume with ~500.000 files, I need to be able to guarantee data
integrity and availability to the users.
7) Is glusterfs "production ready"? Because I find it hard to monitor
and thus trust in these setups. Also performance with small / many
files seems horrible at best - but that's for another discussion.

Thanks for all of your help, Ill continue to try and tweak some
performance out of this. :)

Best regards,
Henrik Juul Pedersen
LIAB ApS

On 22 December 2017 at 07:26, Karthik Subrahmanya <ksubrahm at redhat.com>
wrote:> Hi Henrik,
>
> Thanks for providing the required outputs. See my replies inline.
>
> On Thu, Dec 21, 2017 at 10:42 PM, Henrik Juul Pedersen <hjp at
liab.dk> wrote:
>>
>> Hi Karthik and Ben,
>>
>> I'll try and reply to you inline.
>>
>> On 21 December 2017 at 07:18, Karthik Subrahmanya <ksubrahm at
redhat.com>
>> wrote:
>> > Hey,
>> >
>> > Can you give us the volume info output for this volume?
>>
>> # gluster volume info virt_images
>>
>> Volume Name: virt_images
>> Type: Replicate
>> Volume ID: 9f3c8273-4d9d-4af2-a4e7-4cb4a51e3594
>> Status: Started
>> Snapshot Count: 2
>> Number of Bricks: 1 x (2 + 1) = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: virt3:/data/virt_images/brick
>> Brick2: virt2:/data/virt_images/brick
>> Brick3: printserver:/data/virt_images/brick (arbiter)
>> Options Reconfigured:
>> features.quota-deem-statfs: on
>> features.inode-quota: on
>> features.quota: on
>> features.barrier: disable
>> features.scrub: Active
>> features.bitrot: on
>> nfs.rpc-auth-allow: on
>> server.allow-insecure: on
>> user.cifs: off
>> features.shard: off
>> cluster.shd-wait-qlength: 10000
>> cluster.locking-scheme: granular
>> cluster.data-self-heal-algorithm: full
>> cluster.server-quorum-type: server
>> cluster.quorum-type: auto
>> cluster.eager-lock: enable
>> network.remote-dio: enable
>> performance.low-prio-threads: 32
>> performance.io-cache: off
>> performance.read-ahead: off
>> performance.quick-read: off
>> nfs.disable: on
>> transport.address-family: inet
>> server.outstanding-rpc-limit: 512
>>
>> > Why are you not able to get the xattrs from arbiter brick? It is
the
>> > same
>> > way as you do it on data bricks.
>>
>> Yes I must have confused myself yesterday somehow, here it is in full
>> from all three bricks:
>>
>> Brick 1 (virt2): # getfattr -d -m . -e hex fedora27.qcow2
>> # file: fedora27.qcow2
>> trusted.afr.dirty=0x000000000000000000000000
>> trusted.afr.virt_images-client-1=0x000002280000000000000000
>> trusted.afr.virt_images-client-3=0x000000000000000000000000
>> trusted.bit-rot.version=0x1d000000000000005a3aa0db000c6563
>> trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba
>>
>>
trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732
>>
>>
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000000a49eb0000000000000000001
>> trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
>>
>> Brick 2 (virt3): # getfattr -d -m . -e hex fedora27.qcow2
>> # file: fedora27.qcow2
>> trusted.afr.dirty=0x000000000000000000000000
>> trusted.afr.virt_images-client-2=0x000003ef0000000000000000
>> trusted.afr.virt_images-client-3=0x000000000000000000000000
>> trusted.bit-rot.version=0x19000000000000005a3a9f82000c382a
>> trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba
>>
>>
trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732
>>
>>
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000000a2fbe0000000000000000001
>> trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
>>
>> Brick 3 - arbiter (printserver): # getfattr -d -m . -e hex
fedora27.qcow2
>> # file: fedora27.qcow2
>> trusted.afr.dirty=0x000000000000000000000000
>> trusted.afr.virt_images-client-1=0x000002280000000000000000
>> trusted.bit-rot.version=0x31000000000000005a39237200073206
>> trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba
>>
>>
trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732
>>
>>
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000000000000000000000000000001
>> trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
>>
>> I was expecting trusted.afr.virt_images-client-{1,2,3} on all bricks?
>
> From AFR-V2 we do not have  self blaming attrs. So you will see a brick
> blaming other bricks only.
> For example brcik1 can blame brick2 & brick 3, not itself.
>>
>>
>> > The changelog xattrs are named
trusted.afr.virt_images-client-{1,2,3} in
>> > the
>> > getxattr outputs you have provided.
>> > Did you do a remove-brick and add-brick any time? Otherwise it
will be
>> > trusted.afr.virt_images-client-{0,1,2} usually.
>>
>> Yes, the bricks was moved around initially; brick 0 was re-created as
>> brick 2, and the arbiter was added later on as well.
>>
>> >
>> > To overcome this scenario you can do what Ben Turner had
suggested.
>> > Select
>> > the source copy and change the xattrs manually.
>>
>> I won't mind doing that, but again, the guides assume that I have
>> trusted.afr.virt_images-client-{1,2,3} on all bricks, so I'm not
sure
>> what to change to what, where.
>>
>>
>> > I am suspecting that it has hit the arbiter becoming source for
data
>> > heal
>> > bug. But to confirm that we need the xattrs on the arbiter brick
also.
>> >
>> > Regards,
>> > Karthik
>> >
>> >
>> > On Thu, Dec 21, 2017 at 9:55 AM, Ben Turner <bturner at
redhat.com> wrote:
>> >>
>> >> Here is the process for resolving split brain on replica 2:
>> >>
>> >>
>> >>
>> >>
https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html/Administration_Guide/Recovering_from_File_Split-brain.html
>> >>
>> >> It should be pretty much the same for replica 3, you change
the xattrs
>> >> with something like:
>> >>
>> >> # setfattr -n trusted.afr.vol-client-0 -v
0x000000000000000100000000
>> >> /gfs/brick-b/a
>> >>
>> >> When I try to decide which copy to use I normally run things
like:
>> >>
>> >> # stat /<path to brick>/pat/to/file
>> >>
>> >> Check out the access and change times of the file on the back
end
>> >> bricks.
>> >> I normally pick the copy with the latest access / change
times.  I'll
>> >> also
>> >> check:
>> >>
>> >> # md5sum /<path to brick>/pat/to/file
>> >>
>> >> Compare the hashes of the file on both bricks to see if the
data
>> >> actually
>> >> differs.  If the data is the same it makes choosing the proper
replica
>> >> easier.
>>
>> The files on the bricks differ, so there was something changed, and
>> not replicated.
>>
>> Thanks for the input, I've looked at that, but couldn't get it
to fit,
>> as I dont have trusted.afr.virt_images-client-{1,2,3} on all bricks.
>
> You can choose any one of the copy as good based on the latest ctime/mtime.
> Before doing anything keep the backup of both the copies, so that if
> something bad happens,
> you will have the data safe.
> Now choose one copy as good (based on timestamps/size/choosing a brick as
> source),
> and reset the xattrs set for that on other brick. Then do lookup on that
> file from the mount.
> That should resolve the issue.
> Once you are done, please let us know the result.
>
> Regards,
> Karthik
>>
>>
>> >>
>> >> Any idea how you got in this situation?  Did you have a loss
of NW
>> >> connectivity?  I see you are using server side quorum, maybe
check the
>> >> logs
>> >> for any loss of quorum?  I wonder if there was a loos of
quorum and
>> >> there
>> >> was some sort of race condition hit:
>> >>
>> >>
>> >>
>> >>
http://docs.gluster.org/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/#server-quorum-and-some-pitfalls
>> >>
>> >> "Unlike in client-quorum where the volume becomes
read-only when quorum
>> >> is
>> >> lost, loss of server-quorum in a particular node makes
glusterd kill
>> >> the
>> >> brick processes on that node (for the participating volumes)
making
>> >> even
>> >> reads impossible."
>>
>> I might have had a loss of server quorum, but I cant seem to see
>> exactly why or when from the logs:
>>
>> Times are synchronized between servers. Virt 3 was rebooted for
>> service at 17:29:39. The shutdown logs show an issue with unmounting
>> the bricks, probably because glusterd was still running:
>> Dec 20 17:29:39 virt3 systemd[1]: Failed unmounting /data/virt_images.
>> Dec 20 17:29:39 virt3 systemd[1]: data-filserver.mount: Mount process
>> exited, code=exited status=32
>> Dec 20 17:29:39 virt3 systemd[1]: Failed unmounting /data/filserver.
>> Dec 20 17:29:39 virt3 systemd[1]: Unmounted /virt_images.
>> Dec 20 17:29:39 virt3 systemd[1]: Stopped target Network is Online.
>> Dec 20 17:29:39 virt3 systemd[1]: Stopping GlusterFS, a clustered
>> file-system server...
>> Dec 20 17:29:39 virt3 systemd[1]: Stopping Network Name Resolution...
>> Dec 20 17:29:39 virt3 systemd[1]: Stopped GlusterFS, a clustered
>> file-system server.
>>
>> I believe it was around this time, the virtual machine (running on
>> virt2) was stopped by qemu.
>>
>>
>> Brick 1 (virt2) only experienced loss of quorum when starting gluster
>> (glusterd.log confirms this):
>> Dec 20 17:22:03 virt2 systemd[1]: Starting GlusterFS, a clustered
>> file-system server...
>> Dec 20 17:22:05 virt2 glusterd[739]: [2017-12-20 16:22:05.997472] C
>> [MSGID: 106002]
>> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
>> 0-management: Server quorum lost for volume filserver. Stopping local
>> bricks.
>> Dec 20 17:22:05 virt2 glusterd[739]: [2017-12-20 16:22:05.997666] C
>> [MSGID: 106002]
>> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
>> 0-management: Server quorum lost for volume virt_images. Stopping
>> local bricks.
>> Dec 20 17:22:06 virt2 systemd[1]: Started GlusterFS, a clustered
>> file-system server.
>> Dec 20 17:22:11 virt2 glusterd[739]: [2017-12-20 16:22:11.387238] C
>> [MSGID: 106003]
>> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
>> 0-management: Server quorum regained for volume filserver. Starting
>> local bricks.
>> Dec 20 17:22:11 virt2 glusterd[739]: [2017-12-20 16:22:11.390417] C
>> [MSGID: 106003]
>> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
>> 0-management: Server quorum regained for volume virt_images. Starting
>> local bricks.
>> -- Reboot --
>> Dec 20 18:41:35 virt2 systemd[1]: Starting GlusterFS, a clustered
>> file-system server...
>> Dec 20 18:41:41 virt2 systemd[1]: Started GlusterFS, a clustered
>> file-system server.
>> Dec 20 18:41:43 virt2 glusterd[748]: [2017-12-20 17:41:43.387633] C
>> [MSGID: 106003]
>> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
>> 0-management: Server quorum regained for volume filserver. Starting
>> local bricks.
>> Dec 20 18:41:43 virt2 glusterd[748]: [2017-12-20 17:41:43.391080] C
>> [MSGID: 106003]
>> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
>> 0-management: Server quorum regained for volume virt_images. Starting
>> local bricks.
>>
>>
>> Brick 2 (virt3) shows a network outage on the 19th, but everything
>> worked fine afterwards:
>> Dec 19 13:11:34 virt3 glusterd[10058]: [2017-12-19 12:11:34.382207] C
>> [MSGID: 106003]
>> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
>> 0-management: Server quorum regained for volume filserver. Starting
>> local bricks.
>> Dec 19 13:11:34 virt3 glusterd[10058]: [2017-12-19 12:11:34.387324] C
>> [MSGID: 106003]
>> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
>> 0-management: Server quorum regained for volume virt_images. Starting
>> local bricks.
>> Dec 20 17:29:39 virt3 systemd[1]: Stopping GlusterFS, a clustered
>> file-system server...
>> Dec 20 17:29:39 virt3 systemd[1]: Stopped GlusterFS, a clustered
>> file-system server.
>> -- Reboot --
>> Dec 20 17:30:21 virt3 systemd[1]: Starting GlusterFS, a clustered
>> file-system server...
>> Dec 20 17:30:22 virt3 glusterd[394]: [2017-12-20 16:30:22.826828] C
>> [MSGID: 106002]
>> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
>> 0-management: Server quorum lost for volume filserver. Stopping local
>> bricks.
>> Dec 20 17:30:22 virt3 glusterd[394]: [2017-12-20 16:30:22.827188] C
>> [MSGID: 106002]
>> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
>> 0-management: Server quorum lost for volume virt_images. Stopping
>> local bricks.
>> Dec 20 17:30:23 virt3 systemd[1]: Started GlusterFS, a clustered
>> file-system server.
>> Dec 20 17:30:29 virt3 glusterd[394]: [2017-12-20 16:30:29.488000] C
>> [MSGID: 106003]
>> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
>> 0-management: Server quorum regained for volume filserver. Starting
>> local bricks.
>> Dec 20 17:30:29 virt3 glusterd[394]: [2017-12-20 16:30:29.491446] C
>> [MSGID: 106003]
>> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
>> 0-management: Server quorum regained for volume virt_images. Starting
>> local bricks.
>> Dec 20 18:31:06 virt3 systemd[1]: Stopping GlusterFS, a clustered
>> file-system server...
>> Dec 20 18:31:06 virt3 systemd[1]: Stopped GlusterFS, a clustered
>> file-system server.
>> -- Reboot --
>> Dec 20 18:31:46 virt3 systemd[1]: Starting GlusterFS, a clustered
>> file-system server...
>> Dec 20 18:31:46 virt3 glusterd[386]: [2017-12-20 17:31:46.958818] C
>> [MSGID: 106002]
>> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
>> 0-management: Server quorum lost for volume filserver. Stopping local
>> bricks.
>> Dec 20 18:31:46 virt3 glusterd[386]: [2017-12-20 17:31:46.959168] C
>> [MSGID: 106002]
>> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
>> 0-management: Server quorum lost for volume virt_images. Stopping
>> local bricks.
>> Dec 20 18:31:47 virt3 systemd[1]: Started GlusterFS, a clustered
>> file-system server.
>> Dec 20 18:33:10 virt3 glusterd[386]: [2017-12-20 17:33:10.156180] C
>> [MSGID: 106001]
>> [glusterd-volume-ops.c:1534:glusterd_op_stage_start_volume]
>> 0-management: Server quorum not met. Rejecting operation.
>> Dec 20 18:35:58 virt3 glusterd[386]: [2017-12-20 17:35:58.440395] C
>> [MSGID: 106003]
>> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
>> 0-management: Server quorum regained for volume filserver. Starting
>> local bricks.
>> Dec 20 18:35:58 virt3 glusterd[386]: [2017-12-20 17:35:58.446203] C
>> [MSGID: 106003]
>> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
>> 0-management: Server quorum regained for volume virt_images. Starting
>> local bricks.
>>
>> Brick 3 - arbiter (printserver) shows no loss of quorum at that time
>> (again, glusterd.log confirms):
>> Dec 19 15:33:24 printserver systemd[1]: Starting GlusterFS, a
>> clustered file-system server...
>> Dec 19 15:33:26 printserver glusterd[306]: [2017-12-19
>> 14:33:26.432369] C [MSGID: 106002]
>> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
>> 0-management: Server quorum lost for volume filserver. Stopping local
>> bricks.
>> Dec 19 15:33:26 printserver glusterd[306]: [2017-12-19
>> 14:33:26.432606] C [MSGID: 106002]
>> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
>> 0-management: Server quorum lost for volume virt_images. Stopping
>> local bricks.
>> Dec 19 15:33:26 printserver systemd[1]: Started GlusterFS, a clustered
>> file-system server.
>> Dec 19 15:34:18 printserver glusterd[306]: [2017-12-19
>> 14:34:18.158756] C [MSGID: 106003]
>> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
>> 0-management: Server quorum regained for volume filserver. Starting
>> local bricks.
>> Dec 19 15:34:18 printserver glusterd[306]: [2017-12-19
>> 14:34:18.162242] C [MSGID: 106003]
>> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
>> 0-management: Server quorum regained for volume virt_images. Starting
>> local bricks.
>> Dec 20 18:28:52 printserver systemd[1]: Stopping GlusterFS, a
>> clustered file-system server...
>> Dec 20 18:28:52 printserver systemd[1]: Stopped GlusterFS, a clustered
>> file-system server.
>> -- Reboot --
>> Dec 20 18:30:40 printserver systemd[1]: Starting GlusterFS, a
>> clustered file-system server...
>> Dec 20 18:30:42 printserver glusterd[278]: [2017-12-20
>> 17:30:42.441675] C [MSGID: 106002]
>> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
>> 0-management: Server quorum lost for volume filserver. Stopping local
>> bricks.
>> Dec 20 18:30:42 printserver glusterd[278]: [2017-12-20
>> 17:30:42.441929] C [MSGID: 106002]
>> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
>> 0-management: Server quorum lost for volume virt_images. Stopping
>> local bricks.
>> Dec 20 18:30:42 printserver systemd[1]: Started GlusterFS, a clustered
>> file-system server.
>> Dec 20 18:33:49 printserver glusterd[278]: [2017-12-20
>> 17:33:49.005534] C [MSGID: 106003]
>> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
>> 0-management: Server quorum regained for volume filserver. Starting
>> local bricks.
>> Dec 20 18:33:49 printserver glusterd[278]: [2017-12-20
>> 17:33:49.008010] C [MSGID: 106003]
>> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
>> 0-management: Server quorum regained for volume virt_images. Starting
>> local bricks.
>>
>> >>
>> >> I wonder if the killing of brick processes could have led to
some sort
>> >> of
>> >> race condition where writes were serviced on one brick / the
arbiter
>> >> and not
>> >> the other?
>> >>
>> >> If you can find a reproducer for this please open a BZ with
it, I have
>> >> been seeing something similar(I think) but I haven't been
able to run
>> >> the
>> >> issue down yet.
>> >>
>> >> -b
>>
>> I'm not sure if I can replicate this, a lot has been going on in my
>> setup the past few days (trying to tune some horrible small-file and
>> file creation/deletion performance).
>>
>> Thanks for looking into this with me.
>>
>> Best regards,
>> Henrik Juul Pedersen
>> LIAB ApS
>
>

Karthik Subrahmanya

2017-Dec-22 13:17 UTC

head link

[Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware

Hey Henrik,

Good to know that the issue got resolved. I will try to answer some of the
questions you have.
- The time taken to heal the file depends on its size. That's why you were
seeing some delay in getting everything back to normal in the heal info
output.
- You did not hit the split-brain situation. In split-brain all the bricks
will be blaming the other bricks. But in your case the third brick was not
blamed by any other brick.
- It was not able to heal the file because arbiter can not be source for
data heal. The other two data bricks were blaming each other, so heal was
not able to decide on the source.
  This is arbiter becoming source for data heal issue. We are working on
the fix for this, and it will be shipped with the next release.
- Since it was not in split brain, you were not able see this in heal info
split-brain and not able to resolve this using the cli for split-brain
resolution.
- You can use the heal command to perform syncing of data after brick
maintenance. Once the brick comes up any ways the heal will be triggered
automatically.
- You can use the heal info command to monitor the status of heal.

Regards,
Karthik

On Fri, Dec 22, 2017 at 6:01 PM, Henrik Juul Pedersen <hjp at liab.dk>
wrote:
> Hi Karthik,
>
> Thanks for the info. Maybe the documentation should be updated to
> explain the different AFR versions, I know I was confused.
>
> Also, when looking at the changelogs from my three bricks before fixing:
>
> Brick 1:
> trusted.afr.virt_images-client-1=0x000002280000000000000000
> trusted.afr.virt_images-client-3=0x000000000000000000000000
>
> Brick 2:
> trusted.afr.virt_images-client-2=0x000003ef0000000000000000
> trusted.afr.virt_images-client-3=0x000000000000000000000000
>
> Brick 3 (arbiter):
> trusted.afr.virt_images-client-1=0x000002280000000000000000
>
> I would think that the changelog for client 1 should win by majority
> vote? Or how does the self-healing process work?
> I assumed this as the correct version, and reset client 2 on brick 2:
> # setfattr -n trusted.afr.virt_images-client-2 -v
> 0x000000000000000000000000 fedora27.qcow2
>
> I then did a directory listing, which might have started a heal, but
> heal statistics show (i also did a full heal):
> Starting time of crawl: Fri Dec 22 11:34:47 2017
>
> Ending time of crawl: Fri Dec 22 11:34:47 2017
>
> Type of crawl: INDEX
> No. of entries healed: 0
> No. of entries in split-brain: 0
> No. of heal failed entries: 1
>
> Starting time of crawl: Fri Dec 22 11:39:29 2017
>
> Ending time of crawl: Fri Dec 22 11:39:29 2017
>
> Type of crawl: FULL
> No. of entries healed: 0
> No. of entries in split-brain: 0
> No. of heal failed entries: 1
>
> I was immediately able to touch the file, so gluster was okay about
> it, however heal info still showed the file for a while:
> # gluster volume heal virt_images info
> Brick virt3:/data/virt_images/brick
> /fedora27.qcow2
> Status: Connected
> Number of entries: 1
>
> Brick virt2:/data/virt_images/brick
> /fedora27.qcow2
> Status: Connected
> Number of entries: 1
>
> Brick printserver:/data/virt_images/brick
> /fedora27.qcow2
> Status: Connected
> Number of entries: 1
>
>
>
> Now heal info shows 0 entries, and the two data bricks have the same
> md5sum, so it's back in sync.
>
>
>
> I have a few questions after all of this:
>
> 1) How can a split brain happen in a replica 3 arbiter 1 setup with
> both server- and client quorum enabled?
> 2) Why was it not able to self heal, when tro bricks seemed in sync
> with their changelogs?
> 3) Why could I not see the file in heal info split-brain?
> 4) Why could I not fix this through the cli split-brain resolution tool?
> 5) Is it possible to force a sync in a volume? Or maybe test sync
> status? It might be smart to be able to "flush" changes when
taking a
> brick down for maintenance.
> 6) How am I supposed to monitor events like this? I have a gluster
> volume with ~500.000 files, I need to be able to guarantee data
> integrity and availability to the users.
> 7) Is glusterfs "production ready"? Because I find it hard to
monitor
> and thus trust in these setups. Also performance with small / many
> files seems horrible at best - but that's for another discussion.
>
> Thanks for all of your help, Ill continue to try and tweak some
> performance out of this. :)
>
> Best regards,
> Henrik Juul Pedersen
> LIAB ApS
>
> On 22 December 2017 at 07:26, Karthik Subrahmanya <ksubrahm at
redhat.com>
> wrote:
> > Hi Henrik,
> >
> > Thanks for providing the required outputs. See my replies inline.
> >
> > On Thu, Dec 21, 2017 at 10:42 PM, Henrik Juul Pedersen <hjp at
liab.dk>
> wrote:
> >>
> >> Hi Karthik and Ben,
> >>
> >> I'll try and reply to you inline.
> >>
> >> On 21 December 2017 at 07:18, Karthik Subrahmanya <ksubrahm at
redhat.com>
> >> wrote:
> >> > Hey,
> >> >
> >> > Can you give us the volume info output for this volume?
> >>
> >> # gluster volume info virt_images
> >>
> >> Volume Name: virt_images
> >> Type: Replicate
> >> Volume ID: 9f3c8273-4d9d-4af2-a4e7-4cb4a51e3594
> >> Status: Started
> >> Snapshot Count: 2
> >> Number of Bricks: 1 x (2 + 1) = 3
> >> Transport-type: tcp
> >> Bricks:
> >> Brick1: virt3:/data/virt_images/brick
> >> Brick2: virt2:/data/virt_images/brick
> >> Brick3: printserver:/data/virt_images/brick (arbiter)
> >> Options Reconfigured:
> >> features.quota-deem-statfs: on
> >> features.inode-quota: on
> >> features.quota: on
> >> features.barrier: disable
> >> features.scrub: Active
> >> features.bitrot: on
> >> nfs.rpc-auth-allow: on
> >> server.allow-insecure: on
> >> user.cifs: off
> >> features.shard: off
> >> cluster.shd-wait-qlength: 10000
> >> cluster.locking-scheme: granular
> >> cluster.data-self-heal-algorithm: full
> >> cluster.server-quorum-type: server
> >> cluster.quorum-type: auto
> >> cluster.eager-lock: enable
> >> network.remote-dio: enable
> >> performance.low-prio-threads: 32
> >> performance.io-cache: off
> >> performance.read-ahead: off
> >> performance.quick-read: off
> >> nfs.disable: on
> >> transport.address-family: inet
> >> server.outstanding-rpc-limit: 512
> >>
> >> > Why are you not able to get the xattrs from arbiter brick? It
is the
> >> > same
> >> > way as you do it on data bricks.
> >>
> >> Yes I must have confused myself yesterday somehow, here it is in
full
> >> from all three bricks:
> >>
> >> Brick 1 (virt2): # getfattr -d -m . -e hex fedora27.qcow2
> >> # file: fedora27.qcow2
> >> trusted.afr.dirty=0x000000000000000000000000
> >> trusted.afr.virt_images-client-1=0x000002280000000000000000
> >> trusted.afr.virt_images-client-3=0x000000000000000000000000
> >> trusted.bit-rot.version=0x1d000000000000005a3aa0db000c6563
> >> trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba
> >>
> >> trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d
> 303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732
> >>
> >>
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1>
0x00000000a49eb0000000000000000001
> >> trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
> >>
> >> Brick 2 (virt3): # getfattr -d -m . -e hex fedora27.qcow2
> >> # file: fedora27.qcow2
> >> trusted.afr.dirty=0x000000000000000000000000
> >> trusted.afr.virt_images-client-2=0x000003ef0000000000000000
> >> trusted.afr.virt_images-client-3=0x000000000000000000000000
> >> trusted.bit-rot.version=0x19000000000000005a3a9f82000c382a
> >> trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba
> >>
> >> trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d
> 303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732
> >>
> >>
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1>
0x00000000a2fbe0000000000000000001
> >> trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
> >>
> >> Brick 3 - arbiter (printserver): # getfattr -d -m . -e hex
> fedora27.qcow2
> >> # file: fedora27.qcow2
> >> trusted.afr.dirty=0x000000000000000000000000
> >> trusted.afr.virt_images-client-1=0x000002280000000000000000
> >> trusted.bit-rot.version=0x31000000000000005a39237200073206
> >> trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba
> >>
> >> trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d
> 303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732
> >>
> >>
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1>
0x00000000000000000000000000000001
> >> trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
> >>
> >> I was expecting trusted.afr.virt_images-client-{1,2,3} on all
bricks?
> >
> > From AFR-V2 we do not have  self blaming attrs. So you will see a
brick
> > blaming other bricks only.
> > For example brcik1 can blame brick2 & brick 3, not itself.
> >>
> >>
> >> > The changelog xattrs are named
trusted.afr.virt_images-client-{1,2,3}
> in
> >> > the
> >> > getxattr outputs you have provided.
> >> > Did you do a remove-brick and add-brick any time? Otherwise
it will be
> >> > trusted.afr.virt_images-client-{0,1,2} usually.
> >>
> >> Yes, the bricks was moved around initially; brick 0 was re-created
as
> >> brick 2, and the arbiter was added later on as well.
> >>
> >> >
> >> > To overcome this scenario you can do what Ben Turner had
suggested.
> >> > Select
> >> > the source copy and change the xattrs manually.
> >>
> >> I won't mind doing that, but again, the guides assume that I
have
> >> trusted.afr.virt_images-client-{1,2,3} on all bricks, so I'm
not sure
> >> what to change to what, where.
> >>
> >>
> >> > I am suspecting that it has hit the arbiter becoming source
for data
> >> > heal
> >> > bug. But to confirm that we need the xattrs on the arbiter
brick also.
> >> >
> >> > Regards,
> >> > Karthik
> >> >
> >> >
> >> > On Thu, Dec 21, 2017 at 9:55 AM, Ben Turner <bturner at
redhat.com>
> wrote:
> >> >>
> >> >> Here is the process for resolving split brain on replica
2:
> >> >>
> >> >>
> >> >>
> >> >> https://access.redhat.com/documentation/en-US/Red_Hat_
> Storage/2.1/html/Administration_Guide/Recovering_from_File_Split-
> brain.html
> >> >>
> >> >> It should be pretty much the same for replica 3, you
change the
> xattrs
> >> >> with something like:
> >> >>
> >> >> # setfattr -n trusted.afr.vol-client-0 -v
0x000000000000000100000000
> >> >> /gfs/brick-b/a
> >> >>
> >> >> When I try to decide which copy to use I normally run
things like:
> >> >>
> >> >> # stat /<path to brick>/pat/to/file
> >> >>
> >> >> Check out the access and change times of the file on the
back end
> >> >> bricks.
> >> >> I normally pick the copy with the latest access / change
times.  I'll
> >> >> also
> >> >> check:
> >> >>
> >> >> # md5sum /<path to brick>/pat/to/file
> >> >>
> >> >> Compare the hashes of the file on both bricks to see if
the data
> >> >> actually
> >> >> differs.  If the data is the same it makes choosing the
proper
> replica
> >> >> easier.
> >>
> >> The files on the bricks differ, so there was something changed,
and
> >> not replicated.
> >>
> >> Thanks for the input, I've looked at that, but couldn't
get it to fit,
> >> as I dont have trusted.afr.virt_images-client-{1,2,3} on all
bricks.
> >
> > You can choose any one of the copy as good based on the latest
> ctime/mtime.
> > Before doing anything keep the backup of both the copies, so that if
> > something bad happens,
> > you will have the data safe.
> > Now choose one copy as good (based on timestamps/size/choosing a brick
as
> > source),
> > and reset the xattrs set for that on other brick. Then do lookup on
that
> > file from the mount.
> > That should resolve the issue.
> > Once you are done, please let us know the result.
> >
> > Regards,
> > Karthik
> >>
> >>
> >> >>
> >> >> Any idea how you got in this situation?  Did you have a
loss of NW
> >> >> connectivity?  I see you are using server side quorum,
maybe check
> the
> >> >> logs
> >> >> for any loss of quorum?  I wonder if there was a loos of
quorum and
> >> >> there
> >> >> was some sort of race condition hit:
> >> >>
> >> >>
> >> >>
> >> >> http://docs.gluster.org/en/latest/Administrator%20Guide/
> arbiter-volumes-and-quorum/#server-quorum-and-some-pitfalls
> >> >>
> >> >> "Unlike in client-quorum where the volume becomes
read-only when
> quorum
> >> >> is
> >> >> lost, loss of server-quorum in a particular node makes
glusterd kill
> >> >> the
> >> >> brick processes on that node (for the participating
volumes) making
> >> >> even
> >> >> reads impossible."
> >>
> >> I might have had a loss of server quorum, but I cant seem to see
> >> exactly why or when from the logs:
> >>
> >> Times are synchronized between servers. Virt 3 was rebooted for
> >> service at 17:29:39. The shutdown logs show an issue with
unmounting
> >> the bricks, probably because glusterd was still running:
> >> Dec 20 17:29:39 virt3 systemd[1]: Failed unmounting
/data/virt_images.
> >> Dec 20 17:29:39 virt3 systemd[1]: data-filserver.mount: Mount
process
> >> exited, code=exited status=32
> >> Dec 20 17:29:39 virt3 systemd[1]: Failed unmounting
/data/filserver.
> >> Dec 20 17:29:39 virt3 systemd[1]: Unmounted /virt_images.
> >> Dec 20 17:29:39 virt3 systemd[1]: Stopped target Network is
Online.
> >> Dec 20 17:29:39 virt3 systemd[1]: Stopping GlusterFS, a clustered
> >> file-system server...
> >> Dec 20 17:29:39 virt3 systemd[1]: Stopping Network Name
Resolution...
> >> Dec 20 17:29:39 virt3 systemd[1]: Stopped GlusterFS, a clustered
> >> file-system server.
> >>
> >> I believe it was around this time, the virtual machine (running on
> >> virt2) was stopped by qemu.
> >>
> >>
> >> Brick 1 (virt2) only experienced loss of quorum when starting
gluster
> >> (glusterd.log confirms this):
> >> Dec 20 17:22:03 virt2 systemd[1]: Starting GlusterFS, a clustered
> >> file-system server...
> >> Dec 20 17:22:05 virt2 glusterd[739]: [2017-12-20 16:22:05.997472]
C
> >> [MSGID: 106002]
> >> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> >> 0-management: Server quorum lost for volume filserver. Stopping
local
> >> bricks.
> >> Dec 20 17:22:05 virt2 glusterd[739]: [2017-12-20 16:22:05.997666]
C
> >> [MSGID: 106002]
> >> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> >> 0-management: Server quorum lost for volume virt_images. Stopping
> >> local bricks.
> >> Dec 20 17:22:06 virt2 systemd[1]: Started GlusterFS, a clustered
> >> file-system server.
> >> Dec 20 17:22:11 virt2 glusterd[739]: [2017-12-20 16:22:11.387238]
C
> >> [MSGID: 106003]
> >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> >> 0-management: Server quorum regained for volume filserver.
Starting
> >> local bricks.
> >> Dec 20 17:22:11 virt2 glusterd[739]: [2017-12-20 16:22:11.390417]
C
> >> [MSGID: 106003]
> >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> >> 0-management: Server quorum regained for volume virt_images.
Starting
> >> local bricks.
> >> -- Reboot --
> >> Dec 20 18:41:35 virt2 systemd[1]: Starting GlusterFS, a clustered
> >> file-system server...
> >> Dec 20 18:41:41 virt2 systemd[1]: Started GlusterFS, a clustered
> >> file-system server.
> >> Dec 20 18:41:43 virt2 glusterd[748]: [2017-12-20 17:41:43.387633]
C
> >> [MSGID: 106003]
> >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> >> 0-management: Server quorum regained for volume filserver.
Starting
> >> local bricks.
> >> Dec 20 18:41:43 virt2 glusterd[748]: [2017-12-20 17:41:43.391080]
C
> >> [MSGID: 106003]
> >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> >> 0-management: Server quorum regained for volume virt_images.
Starting
> >> local bricks.
> >>
> >>
> >> Brick 2 (virt3) shows a network outage on the 19th, but everything
> >> worked fine afterwards:
> >> Dec 19 13:11:34 virt3 glusterd[10058]: [2017-12-19
12:11:34.382207] C
> >> [MSGID: 106003]
> >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> >> 0-management: Server quorum regained for volume filserver.
Starting
> >> local bricks.
> >> Dec 19 13:11:34 virt3 glusterd[10058]: [2017-12-19
12:11:34.387324] C
> >> [MSGID: 106003]
> >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> >> 0-management: Server quorum regained for volume virt_images.
Starting
> >> local bricks.
> >> Dec 20 17:29:39 virt3 systemd[1]: Stopping GlusterFS, a clustered
> >> file-system server...
> >> Dec 20 17:29:39 virt3 systemd[1]: Stopped GlusterFS, a clustered
> >> file-system server.
> >> -- Reboot --
> >> Dec 20 17:30:21 virt3 systemd[1]: Starting GlusterFS, a clustered
> >> file-system server...
> >> Dec 20 17:30:22 virt3 glusterd[394]: [2017-12-20 16:30:22.826828]
C
> >> [MSGID: 106002]
> >> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> >> 0-management: Server quorum lost for volume filserver. Stopping
local
> >> bricks.
> >> Dec 20 17:30:22 virt3 glusterd[394]: [2017-12-20 16:30:22.827188]
C
> >> [MSGID: 106002]
> >> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> >> 0-management: Server quorum lost for volume virt_images. Stopping
> >> local bricks.
> >> Dec 20 17:30:23 virt3 systemd[1]: Started GlusterFS, a clustered
> >> file-system server.
> >> Dec 20 17:30:29 virt3 glusterd[394]: [2017-12-20 16:30:29.488000]
C
> >> [MSGID: 106003]
> >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> >> 0-management: Server quorum regained for volume filserver.
Starting
> >> local bricks.
> >> Dec 20 17:30:29 virt3 glusterd[394]: [2017-12-20 16:30:29.491446]
C
> >> [MSGID: 106003]
> >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> >> 0-management: Server quorum regained for volume virt_images.
Starting
> >> local bricks.
> >> Dec 20 18:31:06 virt3 systemd[1]: Stopping GlusterFS, a clustered
> >> file-system server...
> >> Dec 20 18:31:06 virt3 systemd[1]: Stopped GlusterFS, a clustered
> >> file-system server.
> >> -- Reboot --
> >> Dec 20 18:31:46 virt3 systemd[1]: Starting GlusterFS, a clustered
> >> file-system server...
> >> Dec 20 18:31:46 virt3 glusterd[386]: [2017-12-20 17:31:46.958818]
C
> >> [MSGID: 106002]
> >> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> >> 0-management: Server quorum lost for volume filserver. Stopping
local
> >> bricks.
> >> Dec 20 18:31:46 virt3 glusterd[386]: [2017-12-20 17:31:46.959168]
C
> >> [MSGID: 106002]
> >> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> >> 0-management: Server quorum lost for volume virt_images. Stopping
> >> local bricks.
> >> Dec 20 18:31:47 virt3 systemd[1]: Started GlusterFS, a clustered
> >> file-system server.
> >> Dec 20 18:33:10 virt3 glusterd[386]: [2017-12-20 17:33:10.156180]
C
> >> [MSGID: 106001]
> >> [glusterd-volume-ops.c:1534:glusterd_op_stage_start_volume]
> >> 0-management: Server quorum not met. Rejecting operation.
> >> Dec 20 18:35:58 virt3 glusterd[386]: [2017-12-20 17:35:58.440395]
C
> >> [MSGID: 106003]
> >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> >> 0-management: Server quorum regained for volume filserver.
Starting
> >> local bricks.
> >> Dec 20 18:35:58 virt3 glusterd[386]: [2017-12-20 17:35:58.446203]
C
> >> [MSGID: 106003]
> >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> >> 0-management: Server quorum regained for volume virt_images.
Starting
> >> local bricks.
> >>
> >> Brick 3 - arbiter (printserver) shows no loss of quorum at that
time
> >> (again, glusterd.log confirms):
> >> Dec 19 15:33:24 printserver systemd[1]: Starting GlusterFS, a
> >> clustered file-system server...
> >> Dec 19 15:33:26 printserver glusterd[306]: [2017-12-19
> >> 14:33:26.432369] C [MSGID: 106002]
> >> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> >> 0-management: Server quorum lost for volume filserver. Stopping
local
> >> bricks.
> >> Dec 19 15:33:26 printserver glusterd[306]: [2017-12-19
> >> 14:33:26.432606] C [MSGID: 106002]
> >> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> >> 0-management: Server quorum lost for volume virt_images. Stopping
> >> local bricks.
> >> Dec 19 15:33:26 printserver systemd[1]: Started GlusterFS, a
clustered
> >> file-system server.
> >> Dec 19 15:34:18 printserver glusterd[306]: [2017-12-19
> >> 14:34:18.158756] C [MSGID: 106003]
> >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> >> 0-management: Server quorum regained for volume filserver.
Starting
> >> local bricks.
> >> Dec 19 15:34:18 printserver glusterd[306]: [2017-12-19
> >> 14:34:18.162242] C [MSGID: 106003]
> >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> >> 0-management: Server quorum regained for volume virt_images.
Starting
> >> local bricks.
> >> Dec 20 18:28:52 printserver systemd[1]: Stopping GlusterFS, a
> >> clustered file-system server...
> >> Dec 20 18:28:52 printserver systemd[1]: Stopped GlusterFS, a
clustered
> >> file-system server.
> >> -- Reboot --
> >> Dec 20 18:30:40 printserver systemd[1]: Starting GlusterFS, a
> >> clustered file-system server...
> >> Dec 20 18:30:42 printserver glusterd[278]: [2017-12-20
> >> 17:30:42.441675] C [MSGID: 106002]
> >> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> >> 0-management: Server quorum lost for volume filserver. Stopping
local
> >> bricks.
> >> Dec 20 18:30:42 printserver glusterd[278]: [2017-12-20
> >> 17:30:42.441929] C [MSGID: 106002]
> >> [glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
> >> 0-management: Server quorum lost for volume virt_images. Stopping
> >> local bricks.
> >> Dec 20 18:30:42 printserver systemd[1]: Started GlusterFS, a
clustered
> >> file-system server.
> >> Dec 20 18:33:49 printserver glusterd[278]: [2017-12-20
> >> 17:33:49.005534] C [MSGID: 106003]
> >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> >> 0-management: Server quorum regained for volume filserver.
Starting
> >> local bricks.
> >> Dec 20 18:33:49 printserver glusterd[278]: [2017-12-20
> >> 17:33:49.008010] C [MSGID: 106003]
> >> [glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
> >> 0-management: Server quorum regained for volume virt_images.
Starting
> >> local bricks.
> >>
> >> >>
> >> >> I wonder if the killing of brick processes could have led
to some
> sort
> >> >> of
> >> >> race condition where writes were serviced on one brick /
the arbiter
> >> >> and not
> >> >> the other?
> >> >>
> >> >> If you can find a reproducer for this please open a BZ
with it, I
> have
> >> >> been seeing something similar(I think) but I haven't
been able to run
> >> >> the
> >> >> issue down yet.
> >> >>
> >> >> -b
> >>
> >> I'm not sure if I can replicate this, a lot has been going on
in my
> >> setup the past few days (trying to tune some horrible small-file
and
> >> file creation/deletion performance).
> >>
> >> Thanks for looking into this with me.
> >>
> >> Best regards,
> >> Henrik Juul Pedersen
> >> LIAB ApS
> >
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20171222/ea58770e/attachment.html>

Seemingly Similar Threads

Search for more seemingly similar threads

Gluster users - Dec 2017 - Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware

[Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware

[Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware

[Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware

Seemingly Similar Threads