thr3ads.net - Gluster users - [Gluster-users] 3.3.1 Replicate only replicating one way [Mar 2013]

If this information is useful, please help other people find it:
Share via:

Marcus Bointon

2013-Mar-01 00:37 UTC

[Gluster-users] 3.3.1 Replicate only replicating one way

I've given up on trying to upgrade a 3.2.5 installation to 3.3.1 directly,
so I'm scrapping it and starting again. I'm on Ubuntu Lucid, using stock
packages from the semiosis ppa.

My config is very simple - 2 nodes running replicate on a single volume with 4G
of small files, created like this:

gluster volume create shared replica 2 transport tcp 192.168.0.8:/var/shared
192.168.0.34:/var/shared

I copied off all files from the gluster volume, removed all signs of gluster
3.2.5, installed 3.3.1, reconfigured using the same commands as for 3.2.5.
Install, peer probe, volume creation and mount (via NFS) all reported working
correctly. The problem I'm now seeing is that I can touch a file on one side
and it appears on the other, but not the other way around.

If I ask for heal info on the volume, both nodes report zero differences, but ls
shows there are! If I request a full heal, the files appear correctly and the
fixed files appear in the healed list. Something is clearly not talking...

I doubt it's a firewall issue since this was previously a working setup and
the firewall hasn't been touched.

I'm finding it hard to track down since gluster's logs are spread across
so many places - just this simple config has 20+ logs - and I've not found
anything to explain this behaviour.

Node 1:

# gluster peer status
Number of Peers: 1

Hostname: 192.168.0.8
Uuid: 8f30902f-f125-47bc-87dd-fa48e583efd3
State: Peer in Cluster (Connected)

# gluster volume status
Status of volume: shared
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick 192.168.0.8:/var/shared                          24010   Y       22440
Brick 192.168.0.34:/var/shared                          24009   Y       16957
NFS Server on localhost                                 38467   Y       16963
Self-heal Daemon on localhost                           N/A     Y       16969
NFS Server on 192.168.0.8                              38467   Y       22446
Self-heal Daemon on 192.168.0.8                        N/A     Y       22452

Node 2:

# gluster peer status
Number of Peers: 1

Hostname: 192.168.0.34
Uuid: cf6d4c23-a5a2-4c35-859c-52410b6429e1
State: Peer in Cluster (Connected)

# gluster volume status
Status of volume: shared
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick 192.168.0.8:/var/shared                          24010   Y       22440
Brick 192.168.0.34:/var/shared                          24009   Y       16957
NFS Server on localhost                                 38467   Y       22446
Self-heal Daemon on localhost                           N/A     Y       22452
NFS Server on 192.168.0.34                              38467   Y       16963
Self-heal Daemon on 192.168.0.34                        N/A     Y       16969

Having said all that, I've just noticed that files *are* appearing on the
other node in the direction I thought they were not - but it's *really*
slow; I copied about 10,000 files onto it and they are all visible on one node,
but after 30 minutes only 10% of them are present on the other node, and they
are all listed in the 'info healed' output. This sounds to me as if the
replication is only happening in one direction via self-heal, and not through
the normal replication route - it's certainly not synchronous. Any idea what
could be amiss?

Marcus
-- 
Marcus Bointon
Synchromedia Limited: Creators of http://www.smartmessages.net/
UK info at hand CRM solutions
marcus at synchromedia.co.uk | http://www.synchromedia.co.uk/

Todd Stansell

2013-Mar-06 08:02 UTC

head link

[Gluster-users] 3.3.1 Replicate only replicating one way

In our recent testing, we saw all kinds of weird problems while testing
rebuilding a failed brick in the same 2 node replicate cluster.  Several times
we had to kill off all gluster processes and restart things from scratch to
get the two sides talking correctly again (where both sides thought they were
happily talking to the other side, but self-heal wasn't doing anything). 
We'd
run a full heal or stat some files and they wouldn't replicate back to the
other side.  After restarting the processes (not just glusterd, but all of the
glusterfs ones too), things would start working.  Once things were running and
the nodes were properly replicating, it appeared to flow both ways nicely.

We also saw an lstat of a client mount hang once for 105 seconds while we were
rsyncing data into our cluster.  No idea why things would lock up for that
long.  It was an lstat of a directory full of 4GB iso files, so maybe it was
waiting for the isos to copy to both boxes.  At gigabit speed (~950Mbps),
though, 105 seconds is something like 12GB of data.  And not sure why it would
lock out lstat calls.

I'm new to glusterfs, so I don't really have anything more to add.  I
just
wanted you to know I've seen similar weirdness with 3.3.1 in a relatively
simple replicate configuration.

Todd

On Fri, Mar 01, 2013 at 01:37:42AM +0100, Marcus Bointon
wrote:> I've given up on trying to upgrade a 3.2.5 installation to 3.3.1
directly, so I'm scrapping it and starting again. I'm on Ubuntu Lucid,
using stock packages from the semiosis ppa.
> 
> My config is very simple - 2 nodes running replicate on a single volume
with 4G of small files, created like this:
> 
> gluster volume create shared replica 2 transport tcp
192.168.0.8:/var/shared 192.168.0.34:/var/shared
> 
> I copied off all files from the gluster volume, removed all signs of
gluster 3.2.5, installed 3.3.1, reconfigured using the same commands as for
3.2.5. Install, peer probe, volume creation and mount (via NFS) all reported
working correctly. The problem I'm now seeing is that I can touch a file on
one side and it appears on the other, but not the other way around.
> 
> If I ask for heal info on the volume, both nodes report zero differences,
but ls shows there are! If I request a full heal, the files appear correctly and
the fixed files appear in the healed list. Something is clearly not talking...
> 
> I doubt it's a firewall issue since this was previously a working setup
and the firewall hasn't been touched.
> 
> I'm finding it hard to track down since gluster's logs are spread
across so many places - just this simple config has 20+ logs - and I've not
found anything to explain this behaviour.
> 
> Node 1:
> 
> # gluster peer status
> Number of Peers: 1
> 
> Hostname: 192.168.0.8
> Uuid: 8f30902f-f125-47bc-87dd-fa48e583efd3
> State: Peer in Cluster (Connected)
> 
> # gluster volume status
> Status of volume: shared
> Gluster process                                         Port    Online  Pid
>
------------------------------------------------------------------------------
> Brick 192.168.0.8:/var/shared                          24010   Y      
22440
> Brick 192.168.0.34:/var/shared                          24009   Y      
16957
> NFS Server on localhost                                 38467   Y      
16963
> Self-heal Daemon on localhost                           N/A     Y      
16969
> NFS Server on 192.168.0.8                              38467   Y      
22446
> Self-heal Daemon on 192.168.0.8                        N/A     Y      
22452
> 
> Node 2:
> 
> # gluster peer status
> Number of Peers: 1
> 
> Hostname: 192.168.0.34
> Uuid: cf6d4c23-a5a2-4c35-859c-52410b6429e1
> State: Peer in Cluster (Connected)
> 
> # gluster volume status
> Status of volume: shared
> Gluster process                                         Port    Online  Pid
>
------------------------------------------------------------------------------
> Brick 192.168.0.8:/var/shared                          24010   Y      
22440
> Brick 192.168.0.34:/var/shared                          24009   Y      
16957
> NFS Server on localhost                                 38467   Y      
22446
> Self-heal Daemon on localhost                           N/A     Y      
22452
> NFS Server on 192.168.0.34                              38467   Y      
16963
> Self-heal Daemon on 192.168.0.34                        N/A     Y      
16969
> 
> Having said all that, I've just noticed that files *are* appearing on
the other node in the direction I thought they were not - but it's *really*
slow; I copied about 10,000 files onto it and they are all visible on one node,
but after 30 minutes only 10% of them are present on the other node, and they
are all listed in the 'info healed' output. This sounds to me as if the
replication is only happening in one direction via self-heal, and not through
the normal replication route - it's certainly not synchronous. Any idea what
could be amiss?
> 
> Marcus
> -- 
> Marcus Bointon
> Synchromedia Limited: Creators of http://www.smartmessages.net/
> UK info at hand CRM solutions
> marcus at synchromedia.co.uk | http://www.synchromedia.co.uk/
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

Gluster users - Mar 2013 - 3.3.1 Replicate only replicating one way

[Gluster-users] 3.3.1 Replicate only replicating one way

[Gluster-users] 3.3.1 Replicate only replicating one way