On Thu, Jul 17, 2014 at 7:26 AM, Nilesh Govindrajan <me at nileshgr.com>
wrote:> Hello,
>
> I'm having a weird issue. I have this config:
>
> node2 ~ # gluster peer status
> Number of Peers: 1
>
> Hostname: sto1
> Uuid: f7570524-811a-44ed-b2eb-d7acffadfaa5
> State: Peer in Cluster (Connected)
>
> node1 ~ # gluster peer status
> Number of Peers: 1
>
> Hostname: sto2
> Port: 24007
> Uuid: 3a69faa9-f622-4c35-ac5e-b14a6826f5d9
> State: Peer in Cluster (Connected)
>
> Volume Name: home
> Type: Replicate
> Volume ID: 54fef941-2e33-4acf-9e98-1f86ea4f35b7
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: sto1:/data/gluster/home
> Brick2: sto2:/data/gluster/home
> Options Reconfigured:
> performance.write-behind-window-size: 2GB
> performance.flush-behind: on
> performance.cache-size: 2GB
> cluster.choose-local: on
> storage.linux-aio: on
> transport.keepalive: on
> performance.quick-read: on
> performance.io-cache: on
> performance.stat-prefetch: on
> performance.read-ahead: on
> cluster.data-self-heal-algorithm: diff
> nfs.disable: on
>
> sto1/2 is alias to node1/2 respectively.
>
> As you see, NFS is disabled so I'm using the native fuse mount on both
nodes.
> The volume contains files and php scripts that are served on various
> websites. When both nodes are active, I get split brain on many files
> and the mount on node2 going 'input/output error' on many of them
> which causes HTTP 500 errors.
>
> I delete the files from the brick using find -samefile. It fixes for a
> few minutes and then the problem is back.
>
> What could be the issue? This happens even if I use the NFS mounting
method.
>
> Gluster 3.4.4 on Gentoo.
And yes, network connectivity is not an issue between them as both of
them are located in the same DC. They're connected via 1 Gbit line
(common for internal and external network) but external network
doesn't cross 200-500 Mbit/s leaving quite a good window for gluster.
I also tried enabling quorum but that doesn't help either.