thr3ads.net - Gluster users - [Gluster-users] Random and frequent split brain [Jul 2014]

If this information is useful, please help other people find it:
Share via:

Nilesh Govindrajan

2014-Jul-17 01:56 UTC

[Gluster-users] Random and frequent split brain

Hello,

I'm having a weird issue. I have this config:

node2 ~ # gluster peer status
Number of Peers: 1

Hostname: sto1
Uuid: f7570524-811a-44ed-b2eb-d7acffadfaa5
State: Peer in Cluster (Connected)

node1 ~ # gluster peer status
Number of Peers: 1

Hostname: sto2
Port: 24007
Uuid: 3a69faa9-f622-4c35-ac5e-b14a6826f5d9
State: Peer in Cluster (Connected)

Volume Name: home
Type: Replicate
Volume ID: 54fef941-2e33-4acf-9e98-1f86ea4f35b7
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: sto1:/data/gluster/home
Brick2: sto2:/data/gluster/home
Options Reconfigured:
performance.write-behind-window-size: 2GB
performance.flush-behind: on
performance.cache-size: 2GB
cluster.choose-local: on
storage.linux-aio: on
transport.keepalive: on
performance.quick-read: on
performance.io-cache: on
performance.stat-prefetch: on
performance.read-ahead: on
cluster.data-self-heal-algorithm: diff
nfs.disable: on

sto1/2 is alias to node1/2 respectively.

As you see, NFS is disabled so I'm using the native fuse mount on both
nodes.
The volume contains files and php scripts that are served on various
websites. When both nodes are active, I get split brain on many files
and the mount on node2 going 'input/output error' on many of them
which causes HTTP 500 errors.

I delete the files from the brick using find -samefile. It fixes for a
few minutes and then the problem is back.

What could be the issue? This happens even if I use the NFS mounting method.

Gluster 3.4.4 on Gentoo.

Nilesh Govindrajan

2014-Jul-17 01:58 UTC

head link

[Gluster-users] Random and frequent split brain

On Thu, Jul 17, 2014 at 7:26 AM, Nilesh Govindrajan <me at nileshgr.com>
wrote:> Hello,
>
> I'm having a weird issue. I have this config:
>
> node2 ~ # gluster peer status
> Number of Peers: 1
>
> Hostname: sto1
> Uuid: f7570524-811a-44ed-b2eb-d7acffadfaa5
> State: Peer in Cluster (Connected)
>
> node1 ~ # gluster peer status
> Number of Peers: 1
>
> Hostname: sto2
> Port: 24007
> Uuid: 3a69faa9-f622-4c35-ac5e-b14a6826f5d9
> State: Peer in Cluster (Connected)
>
> Volume Name: home
> Type: Replicate
> Volume ID: 54fef941-2e33-4acf-9e98-1f86ea4f35b7
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: sto1:/data/gluster/home
> Brick2: sto2:/data/gluster/home
> Options Reconfigured:
> performance.write-behind-window-size: 2GB
> performance.flush-behind: on
> performance.cache-size: 2GB
> cluster.choose-local: on
> storage.linux-aio: on
> transport.keepalive: on
> performance.quick-read: on
> performance.io-cache: on
> performance.stat-prefetch: on
> performance.read-ahead: on
> cluster.data-self-heal-algorithm: diff
> nfs.disable: on
>
> sto1/2 is alias to node1/2 respectively.
>
> As you see, NFS is disabled so I'm using the native fuse mount on both
nodes.
> The volume contains files and php scripts that are served on various
> websites. When both nodes are active, I get split brain on many files
> and the mount on node2 going 'input/output error' on many of them
> which causes HTTP 500 errors.
>
> I delete the files from the brick using find -samefile. It fixes for a
> few minutes and then the problem is back.
>
> What could be the issue? This happens even if I use the NFS mounting
method.
>
> Gluster 3.4.4 on Gentoo.
And yes, network connectivity is not an issue between them as both of
them are located in the same DC. They're connected via 1 Gbit line
(common for internal and external network) but external network
doesn't cross 200-500 Mbit/s leaving quite a good window for gluster.
I also tried enabling quorum but that doesn't help either.

Gluster users - Jul 2014 - Random and frequent split brain

[Gluster-users] Random and frequent split brain

[Gluster-users] Random and frequent split brain