thr3ads.net - Gluster users - [Gluster-users] GlusterFS 3.3.1 split-brain rsync question [Apr 2013]

If this information is useful, please help other people find it:
Share via:

Daniel Mons

2013-Apr-10 10:44 UTC

[Gluster-users] GlusterFS 3.3.1 split-brain rsync question

Our production GlusterFS 3.3.1GA setup is a 3x2 distribute-replicate,
with 100TB usable for staff.  This is one of 4 identical GlusterFS
clusters we're running.

Very early in the life of our production Gluster rollout, we ran
Netatalk 2.X to share files with MacOSX clients (due to slow negative
lookup on CIFS/Samba for those pesky resource fork files in MacOSX's
Finder).  Netatalk 2.X wrote it's CNID_DB files back to Gluster, which
caused enormous IO, locking up many nodes at a time (lots of "hung
task" errors in dmesg/syslog).

We've since moved to Netatalk 3.X which puts its CNID_DB files
elsewhere (we put them on local SSD RAID), and the lockups have
vanished.  However, our split-brain files number in the tens of
thousands to to those previous lockups, and aren't always predictable
(i.e.: it's not always the case where brick0 is "good" and brick1
is
"bad").  Manually fixing the files is far too time consuming.

I've written a rudimentary script that trawls
/var/log/glusterfs/glustershd.log for split-brain GFIDs, tracks it
down on the matching pair of bricks, and figures out via a few rules
(size tends to be a good indicator for us, as bigger files tend to be
more rencent ones) which is the "good" file.  This works for about 80%
of files, which will dramatically reduce the amount of data we have to
manually check.

My question is: what should I do from here?  Options are:

Option 1) Delete the file from the "bad" brick

Option 2)  rsync the file from the "good" brick to the "bad"
brick
with -aX flag (preserve everything, including trusted.afr.$server and
trusted.gfid xattrs)

Option 3) rsync the file from "good" to "bad", and then
setfattr -x
trusted.* on the bad brick.

Which of these is considered the better (more glustershd compatible)
option?  Or alternatively, is there something else that's preferred?

Normally I'd just test this on our backup gluster, however as it was
never running Netatalk, it has no split-brain problems, so I can't
test the functionality.

Thanks for any insight provided,

-Dan

Pete Smith

2013-Apr-10 15:41 UTC

head link

[Gluster-users] GlusterFS 3.3.1 split-brain rsync question

Hi Dan

I've come up against this recently whilst trying to delete large amounts of
files from our cluster.

I'm resolving it with the method from
http://comments.gmane.org/gmane.comp.file-systems.gluster.user/1917

With Fabric as a helping hand, it's not too tedious.

Not sure about the level of glustershd compatibiity, but it's working for
me.

HTH

Pete
-- 


On 10 April 2013 11:44, Daniel Mons <daemons at kanuka.com.au> wrote:
> Our production GlusterFS 3.3.1GA setup is a 3x2 distribute-replicate,
> with 100TB usable for staff.  This is one of 4 identical GlusterFS
> clusters we're running.
>
> Very early in the life of our production Gluster rollout, we ran
> Netatalk 2.X to share files with MacOSX clients (due to slow negative
> lookup on CIFS/Samba for those pesky resource fork files in MacOSX's
> Finder).  Netatalk 2.X wrote it's CNID_DB files back to Gluster, which
> caused enormous IO, locking up many nodes at a time (lots of "hung
> task" errors in dmesg/syslog).
>
> We've since moved to Netatalk 3.X which puts its CNID_DB files
> elsewhere (we put them on local SSD RAID), and the lockups have
> vanished.  However, our split-brain files number in the tens of
> thousands to to those previous lockups, and aren't always predictable
> (i.e.: it's not always the case where brick0 is "good" and
brick1 is
> "bad").  Manually fixing the files is far too time consuming.
>
> I've written a rudimentary script that trawls
> /var/log/glusterfs/glustershd.log for split-brain GFIDs, tracks it
> down on the matching pair of bricks, and figures out via a few rules
> (size tends to be a good indicator for us, as bigger files tend to be
> more rencent ones) which is the "good" file.  This works for
about 80%
> of files, which will dramatically reduce the amount of data we have to
> manually check.
>
> My question is: what should I do from here?  Options are:
>
> Option 1) Delete the file from the "bad" brick
>
> Option 2)  rsync the file from the "good" brick to the
"bad" brick
> with -aX flag (preserve everything, including trusted.afr.$server and
> trusted.gfid xattrs)
>
> Option 3) rsync the file from "good" to "bad", and then
setfattr -x
> trusted.* on the bad brick.
>
> Which of these is considered the better (more glustershd compatible)
> option?  Or alternatively, is there something else that's preferred?
>
> Normally I'd just test this on our backup gluster, however as it was
> never running Netatalk, it has no split-brain problems, so I can't
> test the functionality.
>
> Thanks for any insight provided,
>
> -Dan
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>


-- 
Pete Smith
DevOp/System Administrator
Realise Studio
12/13 Poland Street, London W1F 8QB
T. +44 (0)20 7165 9644

realisestudio.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130410/28a0bcd0/attachment.html>

Robert Hajime Lanning

2013-Apr-10 23:48 UTC

head link

[Gluster-users] GlusterFS 3.3.1 split-brain rsync question

On 04/10/13 03:44, Daniel Mons wrote:
[snip]>
> Option 1) Delete the file from the "bad" brick
>
I would do this.  Then trigger a self-heal.

-- 
Mr. Flibble
King of the Potato People

Gluster users - Apr 2013 - GlusterFS 3.3.1 split-brain rsync question

[Gluster-users] GlusterFS 3.3.1 split-brain rsync question

[Gluster-users] GlusterFS 3.3.1 split-brain rsync question

[Gluster-users] GlusterFS 3.3.1 split-brain rsync question