thr3ads.net - Gluster users - [Gluster-users] Rebuild Distributed/Replicated Setup [May 2011]

If this information is useful, please help other people find it:
Share via:

Remi Broemeling

2011-May-16 17:17 UTC

[Gluster-users] Rebuild Distributed/Replicated Setup

Hi,

I've got a distributed/replicated GlusterFS v3.1.2 (installed via RPM) setup
across two servers (web01 and web02) with the following vol config:

volume shared-application-data-client-0
    type protocol/client
    option remote-host web01
    option remote-subvolume /var/glusterfs/bricks/shared
    option transport-type tcp
    option ping-timeout 5
end-volume

volume shared-application-data-client-1
    type protocol/client
    option remote-host web02
    option remote-subvolume /var/glusterfs/bricks/shared
    option transport-type tcp
    option ping-timeout 5
end-volume

volume shared-application-data-replicate-0
    type cluster/replicate
    subvolumes shared-application-data-client-0
shared-application-data-client-1
end-volume

volume shared-application-data-write-behind
    type performance/write-behind
    subvolumes shared-application-data-replicate-0
end-volume

volume shared-application-data-read-ahead
    type performance/read-ahead
    subvolumes shared-application-data-write-behind
end-volume

volume shared-application-data-io-cache
    type performance/io-cache
    subvolumes shared-application-data-read-ahead
end-volume

volume shared-application-data-quick-read
    type performance/quick-read
    subvolumes shared-application-data-io-cache
end-volume

volume shared-application-data-stat-prefetch
    type performance/stat-prefetch
    subvolumes shared-application-data-quick-read
end-volume

volume shared-application-data
    type debug/io-stats
    subvolumes shared-application-data-stat-prefetch
end-volume

In total, four servers mount this via GlusterFS FUSE.  For whatever reason
(I'm really not sure why), the GlusterFS filesystem has run into a bit of
split-brain nightmare (although to my knowledge an actual split brain
situation has never occurred in this environment), and I have been getting
solidly corrupted issues across the filesystem as well as complaints that
the filesystem cannot be self-healed.

What I would like to do is completely empty one of the two servers (here I
am trying to empty server web01), making the other one (in this case web02)
the authoritative source for the data; and then have web01 completely
rebuild it's mirror directly from web02.

What's the easiest/safest way to do this?  Is there a command that I can run
that will force web01 to re-initialize it's mirror directly from web02 (and
thus completely eradicate all of the split-brain errors and data
inconsistencies)?

Thanks!
-- 
Remi Broemeling
System Administrator
Clio - Practice Management Simplified
1-888-858-2546 x(2^5) | remi at goclio.com
www.goclio.com | blog <http://www.goclio.com/blog> |
twitter<http://www.twitter.com/goclio>
 | facebook <http://www.facebook.com/goclio>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110516/83201e20/attachment.html>

Pranith Kumar. Karampuri

2011-May-17 02:48 UTC

head link

[Gluster-users] Rebuild Distributed/Replicated Setup

hi Remi,
    Would it be possible to post the logs on the client, so that we can find
what issue you are running into.

Pranith
----- Original Message -----
From: "Remi Broemeling" <remi at goclio.com>
To: gluster-users at gluster.org
Sent: Monday, May 16, 2011 10:47:33 PM
Subject: [Gluster-users] Rebuild Distributed/Replicated Setup

Hi, 

I've got a distributed/replicated GlusterFS v3.1.2 (installed via RPM) setup
across two servers (web01 and web02) with the following vol config:

volume shared-application-data-client-0 
type protocol/client 
option remote-host web01 
option remote-subvolume /var/glusterfs/bricks/shared 
option transport-type tcp 
option ping-timeout 5 
end-volume 

volume shared-application-data-client-1 
type protocol/client 
option remote-host web02 
option remote-subvolume /var/glusterfs/bricks/shared 
option transport-type tcp 
option ping-timeout 5 
end-volume 

volume shared-application-data-replicate-0 
type cluster/replicate 
subvolumes shared-application-data-client-0 shared-application-data-client-1 
end-volume 

volume shared-application-data-write-behind 
type performance/write-behind 
subvolumes shared-application-data-replicate-0 
end-volume 

volume shared-application-data-read-ahead 
type performance/read-ahead 
subvolumes shared-application-data-write-behind 
end-volume 

volume shared-application-data-io-cache 
type performance/io-cache 
subvolumes shared-application-data-read-ahead 
end-volume 

volume shared-application-data-quick-read 
type performance/quick-read 
subvolumes shared-application-data-io-cache 
end-volume 

volume shared-application-data-stat-prefetch 
type performance/stat-prefetch 
subvolumes shared-application-data-quick-read 
end-volume 

volume shared-application-data 
type debug/io-stats 
subvolumes shared-application-data-stat-prefetch 
end-volume 

In total, four servers mount this via GlusterFS FUSE. For whatever reason
(I'm really not sure why), the GlusterFS filesystem has run into a bit of
split-brain nightmare (although to my knowledge an actual split brain situation
has never occurred in this environment), and I have been getting solidly
corrupted issues across the filesystem as well as complaints that the filesystem
cannot be self-healed.

What I would like to do is completely empty one of the two servers (here I am
trying to empty server web01), making the other one (in this case web02) the
authoritative source for the data; and then have web01 completely rebuild
it's mirror directly from web02.

What's the easiest/safest way to do this? Is there a command that I can run
that will force web01 to re-initialize it's mirror directly from web02 (and
thus completely eradicate all of the split-brain errors and data
inconsistencies)?

Thanks! 
-- 

Remi Broemeling 
System Administrator 
Clio - Practice Management Simplified 
1-888-858-2546 x(2^5) | remi at goclio.com 
www.goclio.com | blog | twitter | facebook 

_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Gluster users - May 2011 - Rebuild Distributed/Replicated Setup

[Gluster-users] Rebuild Distributed/Replicated Setup

[Gluster-users] Rebuild Distributed/Replicated Setup