Hi Adrian,
Correct me if I've got you wrong - You have 2 servers and a client
replicates to both the servers. If the first server is down, the client also
does not respond. You mentioned about more than 1 client - can you clarify this
so that we can try and understand the issue.
Pavan
On 01/10/09 08:41 +0200, Adrian Moisey wrote:> Hi
>
> I am currently testing GlusterFS in with replication.
> I am running Ubuntu hardy using packages from the PPA on launchpad.net.
> I am currently using glusterfs 2.0.6.
>
> I have 2 machines, both exporting 1 brick each. This is the config I'm
> using:
>
----8<----8<----8<----8<----8<----8<----8<----8<----8<----
> volume posix
> type storage/posix
> option directory /home/export/
> end-volume
>
> volume locks
> type features/locks
> subvolumes posix
> end-volume
>
> volume cache
> type performance/io-cache
> subvolumes locks
> end-volume
>
> volume brick
> type performance/io-threads
> option thread-count 8
> subvolumes cache
> end-volume
>
> ### Add network serving capability to above brick.
> volume server
> type protocol/server
> option transport-type tcp
> subvolumes brick
> option auth.addr.brick.allow * # Allow access to "brick" volume
> end-volume
>
----8<----8<----8<----8<----8<----8<----8<----8<----8<----
>
> I then have 2 clients (which happen to be the same 2 machines) that
> connect to both bricks and replicate them using this config:
>
>
----8<----8<----8<----8<----8<----8<----8<----8<----8<----
> ### Add client feature and attach to remote subvolume of server1
> volume brick1
> type protocol/client
> option transport-type tcp
> option remote-host 172.19.45.102 # IP address of the remote brick
> option remote-subvolume brick # name of the remote volume
> end-volume
>
> ### Add client feature and attach to remote subvolume of server2
> volume brick2
> type protocol/client
> option transport-type tcp
> option remote-host 172.19.45.103 # IP address of the remote brick
> option remote-subvolume brick # name of the remote volume
> end-volume
>
> volume replicate
> type cluster/replicate
> subvolumes brick1 brick2
> end-volume
>
----8<----8<----8<----8<----8<----8<----8<----8<----8<----
>
> If I start the 2 servers up, then mount both clients everything works
> file. I have shared storage which is replicated to each host.
>
> If I shut the one brick down, the client on that machine also dies and I
> get strange errors:
>
----8<----8<----8<----8<----8<----8<----8<----8<----8<----
> # cd /mnt/gluster
> bash: cd: /mnt/gluster: Transport endpoint is not connected
> # df -h
> Filesystem Size Used Avail Use% Mounted on
> /dev/sda1 9.5G 1.1G 7.9G 13% /
> varrun 125M 68K 125M 1% /var/run
> varlock 125M 0 125M 0% /var/lock
> udev 125M 44K 125M 1% /dev
> devshm 125M 0 125M 0% /dev/shm
> df: `/mnt/gluster': Transport endpoint is not connected
> # mount
> /dev/sda1 on / type ext3 (rw,relatime,errors=remount-ro)
> proc on /proc type proc (rw,noexec,nosuid,nodev)
> /sys on /sys type sysfs (rw,noexec,nosuid,nodev)
> varrun on /var/run type tmpfs (rw,noexec,nosuid,nodev,mode=0755)
> varlock on /var/lock type tmpfs (rw,noexec,nosuid,nodev,mode=1777)
> udev on /dev type tmpfs (rw,mode=0755)
> devshm on /dev/shm type tmpfs (rw)
> devpts on /dev/pts type devpts (rw,gid=5,mode=620)
> securityfs on /sys/kernel/security type securityfs (rw)
> /etc/glusterfs/glusterfs.vol on /mnt/gluster type fuse.glusterfs
> (rw,allow_other,default_permissions,max_read=131072)
>
----8<----8<----8<----8<----8<----8<----8<----8<----8<----
>
> Here is a copy of debug logs:
> [2009-10-01 08:16:15] D [glusterfsd.c:354:_get_specfp] glusterfs:
> loading volume file /etc/glusterfs/glusterfs.vol
>
===============================================================================>
Version : glusterfs 2.0.6 built on Aug 31 2009 20:14:31
> TLA Revision : v2.0.6
> Starting Time: 2009-10-01 08:16:15
> Command line : glusterfs --log-level=DEBUG
> --volfile=/etc/glusterfs/glusterfs.vol /mnt/gluster/
> PID : 17884
> System name : Linux
> Nodename : cj-cpt-molb01
> Kernel Release : 2.6.24-24-server
> Hardware Identifier: i686
>
> Given volfile:
>
+------------------------------------------------------------------------------+
> 1: ### Add client feature and attach to remote subvolume of server1
> 2: volume brick1
> 3: type protocol/client
> 4: option transport-type tcp
> 5: option remote-host 172.19.45.102 # IP address of the remote
> brick
> 6: option remote-subvolume brick # name of the remote volume
> 7: end-volume
> 8:
> 9: ### Add client feature and attach to remote subvolume of server2
> 10: volume brick2
> 11: type protocol/client
> 12: option transport-type tcp
> 13: option remote-host 172.19.45.103 # IP address of the remote
> brick
> 14: option remote-subvolume brick # name of the remote volume
> 15: end-volume
> 16:
> 17: volume replicate
> 18: type cluster/replicate
> 19: subvolumes brick1 brick2
> 20: end-volume
>
>
+------------------------------------------------------------------------------+
> [2009-10-01 08:16:15] D [glusterfsd.c:1205:main] glusterfs: running in
> pid 17884
> [2009-10-01 08:16:15] D [client-protocol.c:5952:init] brick1: defaulting
> frame-timeout to 30mins
> [2009-10-01 08:16:15] D [client-protocol.c:5963:init] brick1: defaulting
> ping-timeout to 10
> [2009-10-01 08:16:15] D [transport.c:141:transport_load] transport:
> attempt to load file /usr/lib/glusterfs/2.0.6/transport/socket.so
> [2009-10-01 08:16:15] D [transport.c:141:transport_load] transport:
> attempt to load file /usr/lib/glusterfs/2.0.6/transport/socket.so
> [2009-10-01 08:16:15] D [client-protocol.c:5952:init] brick2: defaulting
> frame-timeout to 30mins
> [2009-10-01 08:16:15] D [client-protocol.c:5963:init] brick2: defaulting
> ping-timeout to 10
> [2009-10-01 08:16:15] D [transport.c:141:transport_load] transport:
> attempt to load file /usr/lib/glusterfs/2.0.6/transport/socket.so
> [2009-10-01 08:16:15] D [transport.c:141:transport_load] transport:
> attempt to load file /usr/lib/glusterfs/2.0.6/transport/socket.so
> [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick1: got
> GF_EVENT_PARENT_UP, attempting connect on transport
> [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick1: got
> GF_EVENT_PARENT_UP, attempting connect on transport
> [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick2: got
> GF_EVENT_PARENT_UP, attempting connect on transport
> [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick2: got
> GF_EVENT_PARENT_UP, attempting connect on transport
> [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick1: got
> GF_EVENT_PARENT_UP, attempting connect on transport
> [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick1: got
> GF_EVENT_PARENT_UP, attempting connect on transport
> [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick2: got
> GF_EVENT_PARENT_UP, attempting connect on transport
> [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick2: got
> GF_EVENT_PARENT_UP, attempting connect on transport
> [2009-10-01 08:16:15] N [glusterfsd.c:1224:main] glusterfs: Successfully
> started
> [2009-10-01 08:16:15] D [client-protocol.c:6294:notify] brick1: got
> GF_EVENT_CHILD_UP
> [2009-10-01 08:16:15] D [client-protocol.c:6294:notify] brick1: got
> GF_EVENT_CHILD_UP
> [2009-10-01 08:16:15] N [client-protocol.c:5559:client_setvolume_cbk]
> brick1: Connected to 172.19.45.102:6996, attached to remote volume
> 'brick'.
> [2009-10-01 08:16:15] N [afr.c:2203:notify] replicate: Subvolume
> 'brick1' came back up; going online.
> [2009-10-01 08:16:15] N [client-protocol.c:5559:client_setvolume_cbk]
> brick1: Connected to 172.19.45.102:6996, attached to remote volume
> 'brick'.
> [2009-10-01 08:16:15] N [afr.c:2203:notify] replicate: Subvolume
> 'brick1' came back up; going online.
> [2009-10-01 08:16:15] D [client-protocol.c:6294:notify] brick2: got
> GF_EVENT_CHILD_UP
> [2009-10-01 08:16:15] D [client-protocol.c:6294:notify] brick2: got
> GF_EVENT_CHILD_UP
> [2009-10-01 08:16:15] N [client-protocol.c:5559:client_setvolume_cbk]
> brick2: Connected to 172.19.45.103:6996, attached to remote volume
> 'brick'.
> [2009-10-01 08:16:15] N [client-protocol.c:5559:client_setvolume_cbk]
> brick2: Connected to 172.19.45.103:6996, attached to remote volume
> 'brick'.
> [2009-10-01 08:17:24] N [client-protocol.c:6246:notify] brick1:
disconnected
> [2009-10-01 08:17:27] E [socket.c:745:socket_connect_finish] brick1:
> connection to 172.19.45.102:6996 failed (Connection refused)
> [2009-10-01 08:17:27] E [socket.c:745:socket_connect_finish] brick1:
> connection to 172.19.45.102:6996 failed (Connection refused)
>
>
>
> Any ideas?
>
>
> --
> Adrian Moisey
> Systems Designer | CareerJunction | Better jobs. More often.
> Web: www.careerjunction.co.za | Email: adrian at careerjunction.co.za
> Phone: +27 21 818 8621 | Mobile: +27 82 858 7830 | Fax: +27 21 818 8855
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users