But in the vast majority of cases I'm not seeing specific paths to
split-brained files. All I get is a big list of GFIDs with one or two
human-readable paths dotted in there (that weren't there when I first
posted a week ago). How do I go from a GFID to a file I can identify?
gluster volume heal <vol-name> info
Brick server1:/brick
<gfid:85893940-63a8-4fa3-bf83-9e894fe852c7>
<gfid:8b325ef9-a8d2-4088-a8ae-c73f4b9390fc>
<gfid:ed815f9b-9a97-4c21-86a1-da203b023cda>
/some/path/to/a/known/file <- that only seems to exist
on one server
<gfid:7fdbd6da-b09d-4eaf-a99b-2fbe889d2c5f>
...
Number of entries: 217
Brick server2:/brick
Number of entries: 0
and
gluster volume heal <vol-name> info split-brain
Brick server1:/brick
Number of entries in split-brain: 0
Brick server2:/brick
Number of entries in split-brain: 0
??
> -----Original Message-----
> From: Diego Remolina [mailto:dijuremo at gmail.com]
> Sent: 30 October 2015 14:29
> To: Iain Milne
> Cc: gluster-users at gluster.org List
> Subject: Re: [Gluster-users] Avoiding Split Brains
>
> Read carefully the blog from JoeJulian, it tells you how to identify and
clear the> files in split brain. Make sure you have good backups prior to erasing
anything.>
> https://joejulian.name/blog/fixing-split-brain-with-glusterfs-33/
>
> He even provides a script.
>
> Diego
>
> On Fri, Oct 30, 2015 at 10:08 AM, Iain Milne <glusterfs at
noognet.org> wrote:
> > Ok, thanks - that certainly helps (a lot!), but what about all these
> > gfid files? Are they files in split-brain or something else? The links
> > don't cover dealing with anything like this :-(
> >
> > My impression is that maybe they're files have haven't
replicated
> > and/or haven't been self healed, for whatever reason...
> >
> >> -----Original Message-----
> >> From: Diego Remolina [mailto:dijuremo at gmail.com]
> >> Sent: 30 October 2015 12:58
> >> To: Iain Milne
> >> Cc: gluster-users at gluster.org List
> >> Subject: Re: [Gluster-users] Avoiding Split Brains
> >>
> >> Yes, you need to avoid split brain on a two node replica=2 setup.
You
> > can just
> >> add a third node with no bricks which serves as the arbiter and
set
> > quorum to
> >> 51%.
> >>
> >> If you set quorum to 51% and do not have more than 2 nodes, then
when
> >> one goes down all your gluster mounts become unavailable (or is it
> >> just read
> > only?).
> >> If you run VMs on top of this then you usually end up with
> >> paused/frozen
> > vms
> >> until the volume becomes available again.
> >>
> >> These are RH specific docs, but may help:
> >>
> >> https://access.redhat.com/documentation/en-
> >> US/Red_Hat_Storage/2.0/html/Administration_Guide/sect-User_Guide-
> >> Managing_Volumes-Quorum.html
> >>
> >> https://access.redhat.com/documentation/en-
> >>
US/Red_Hat_Storage/3/html/Administration_Guide/sect-Managing_Split-
> >> brain.html
> >>
> >> First time in testing I hit split brain, I found these blog very
useful:
> >>
> >> https://joejulian.name/blog/fixing-split-brain-with-glusterfs-33/
> >>
> >> HTH,
> >>
> >> Diego
> >>
> >> On Fri, Oct 30, 2015 at 8:46 AM, Iain Milne <glusterfs at
noognet.org>
wrote:> >> > Anyone?
> >> >
> >> >> -----Original Message-----
> >> >> From: gluster-users-bounces at gluster.org
[mailto:gluster-users-
> >> >> bounces at gluster.org] On Behalf Of Iain Milne
> >> >> Sent: 21 October 2015 09:23
> >> >> To: gluster-users at gluster.org
> >> >> Subject: [Gluster-users] Avoiding Split Brains
> >> >>
> >> >> Hi all,
> >> >>
> >> >> We've been running a distributed setup for 3 years
with no issues.
> >> >> Recently we switched to a 2-server, replicated setup
(soon to be a
> >> >> 4
> >> >> servers) and keep encountering what I assume are
split-brain
> >> >> situations,
> >> >> eg:
> >> >>
> >> >> Brick server1:/brick
> >> >> <gfid:85893940-63a8-4fa3-bf83-9e894fe852c7>
> >> >> <gfid:8b325ef9-a8d2-4088-a8ae-c73f4b9390fc>
> >> >> <gfid:ed815f9b-9a97-4c21-86a1-da203b023cda>
> >> >> <gfid:7fdbd6da-b09d-4eaf-a99b-2fbe889d2c5f>
> >> >> ...
> >> >> Number of entries: 217
> >> >>
> >> >> Brick server2:/brick
> >> >> Number of entries: 0
> >> >>
> >> >> a) What does this mean?
> >> >> b) How do I go about fixing it?
> >> >>
> >> >> And perhaps more importantly, how to I avoid this
happening in the
> > future?
> >> >> Not once since moving to replication has either of the
two servers
> >> >> been
> >> > offline
> >> >> or unavailable (to my knowledge).
> >> >>
> >> >> Is some sort of server/client quorum needed (that I admit
I don't
> >> >> fully understand)? While high-availability would be nice
to have,
> >> >> it's not
> >> > essential -
> >> >> robustness of the data is.
> >> >>
> >> >> Thanks
> >> >>
> >> >> Iain