Daniel,
There were fixes that went into HA recently. Can you check if the bug
is still there?
Krishna
On Wed, Jan 14, 2009 at 11:22 PM, Daniel Maher <dma+gluster at witbe.net>
wrote:> Hi all,
>
> In testing the HA translator under 2.0.0rc1, i've managed to create a
> simple and reproducible scenario in which Gluster fails to maintain
> communication between the client and the server(s).
>
> Server01 and Server02 are AFR'ing each other, with Client01 connected
> via the HA translator. As a simple test, i launch a script that echoes
> an increasing counter to a text file in the Gluster mount on Client01.
> Client01 is communicating with Server01 in this instance.
>
> I cleanly stop glusterfsd on Server01, and after a momentary hiccup
> (noted in the log excerpt below), things continue to function as
> expected - Client01 commences communication with Server02. So far so good.
>
> 2009-01-15 15:54:19 E [socket.c:708:socket_connect_finish] export01:
> connection failed (Connection refused)
>
> I re-start glusterfsd on Server01, then, i cleanly stop glusterfsd on
> Server02 (which, of course, Client01 is now communicating with).
> Client01 freaks out (see log excerpt below), does /not/ attempt to
> contact Server01 again, and leaves me with the dreaded "transport
> endpoint not connected" situation.
>
> 2009-01-15 16:06:02 E [ha-helpers.c:266:_ha_next_active_child_for_ctx]
> export-ha: none of the children are connected other than export02
> 2009-01-15 16:06:02 E [ha.c:2715:ha_fstat_cbk] export-ha: no active
> subvolume
> 2009-01-15 16:06:02 E [fuse-bridge.c:533:fuse_attr_cbk] glusterfs-fuse:
> 2932: FSTAT() /counter.txt => -1 (Transport endpoint is not connected)
>
> Client01 sometimes recovers from this, and sometimes it does not. When
> it does not recover from this situation, the only solution is manual
> intervention (unmount / remount). That's not the worst of it, though :
> when it /does/ recover, re-starting glusterfsd on Server02 (!) causes
> even more of the errors (see below), and /always/ results in a total
> failure on Client01 within a second or two (transport endpoint not
> connected). Client01 never recovers from this.
>
> 2009-01-15 19:04:56 E [ha-helpers.c:266:_ha_next_active_child_for_ctx]
> export-ha: none of the children are connected other than export01
> 2009-01-15 19:04:56 E [ha.c:2515:ha_flush_cbk] export-ha: no active
> subvolume
> 2009-01-15 19:04:56 E [fuse-bridge.c:911:fuse_err_cbk] glusterfs-fuse:
> 3058: FLUSH() ERR => -1 (Transport endpoint is not connected)
>
>
> I strongly suspect this is not the expected behaviour of the High
> Availability translator. :)
>
>
> Servers are running FC9 i386, Client is FC10 i386.
>
> # glusterfs --version
> glusterfs 2.0.0rc1 built on Jan 14 2009 13:19:06
> Repository revision: glusterfs--mainline--3.0--patch-844
>
> # rpm -qa | grep fuse
> fuse-2.7.3glfs10-1.i386
> fuse-devel-2.7.3glfs10-1.i386
> fuse-libs-2.7.3glfs10-1.i386
>
>
> Server config :
>
> # cat /etc/glusterfs/glusterfs-server.vol
> # dataspace
> volume test-ds
> type storage/posix
> option directory /opt/datadir
> end-volume
>
> # posix locks for test-ds
> volume test-ds-locks
> type features/locks
> option mandatory-locks on
> subvolumes test-ds
> end-volume
>
> # dataspace of test-ds on Server01
> volume test-01-ds
> type protocol/client
> option transport-type tcp/client
> option remote-host 192.168.0.183
> option remote-subvolume test-ds-locks
> option transport-timeout 10
> end-volume
>
> # automatic file replication translator for test dataspace
> volume test-ds-afr
> type cluster/afr
> subvolumes test-ds-locks test-01-ds
> end-volume
>
> # the actual export
> volume export
> type performance/io-threads
> option thread-count 8
> subvolumes test-ds-afr
> end-volume
>
> # server declaration
> volume server
> type protocol/server
> option transport-type tcp/server
> subvolumes export
> option auth.addr.export.allow
> 192.168.0.73,192.168.0.183,192.168.0.166,127.0.0.1
> option auth.addr.test-ds-locks.allow
> 192.168.0.73,192.168.0.183,192.168.0.166,127.0.0.1
> end-volume
>
>
>
> client config :
> # cat /etc/glusterfs/glusterfs-client.vol
>
> # export on Server01
> volume export01
> type protocol/client
> option transport-type tcp/client
> option remote-host 192.168.0.183
> option remote-subvolume export # exported volume
> end-volume
>
> # export on Server02
> volume export02
> type protocol/client
> option transport-type tcp/client
> option remote-host 192.168.0.166
> option remote-subvolume export # exported volume
> end-volume
>
> # exports clustered via HA
> volume export-ha
> type cluster/ha
> subvolumes export01 export02
> end-volume
>
>
>
> --
> Daniel Maher <dma+gluster AT witbe DOT net>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>