Matthew Davis
2009-Mar-11 17:53 UTC
[Gluster-users] Unexpected AFR self-heal behavior (file contents not synced).
I have a simple test setup where it appears AFR is not syncing the contents of files. The setup (see configs below) is server-side AFR with a client running on each machine accessing the local server instance. I'm using glusterfs-2.0.0rc4 (had similar behavior with 1.3.12). I've got to be missing something, but I cannot see what it is. My goal is to get two HA-linux servers working with some of the directories replicated between them without using DRBD or rsync. Starting up both servers I get good connections between (no errors logged). The client on both machines can mount without problems as well. Here is the file manipulation sequence and the resulting problem(s): A=Server A (exporting /opt/gluster/ds, client mounted to /mnt/gluster) B=Server B (exporting /opt/gluster/ds, client mounted to /mnt/gluster) A) echo "hi" > /mnt/gluster/x Both) 'cat /mnt/gluster/x' shows "hi" (AFR works at this point) B) killall glusterfsd, unmount /mnt/gluster A) echo "bye" > /mnt/gluster/x A) echo "hi" > /mnt/gluster/y B) start up server, remount /mnt/gluster Both) ls -lR /mnt/gluster A) cat both files on /mnt/gluster/ shows x to contain "bye", and y to contain "hi" (looks ok) B) cat both files on /mnt/gluster/ and they are empty! (even though ls shows same size files as on server A) A) cat both files in /opt/gluster/ds/ shows expected contents B) cat both files in /opt/gluster/ds/ shows empty! A) both files in /opt/gluster/ds have trusted.glusterfs.afr.data-pending=0x0000000000000000 and trusted.glusterfs.afr.metadata-pending=0x0000000000000000 B) both files in /opt/gluster/ds have same values as on Server A Here are my config files: -------------------------------------------------------------------------------- Server A server.vol -------------------------------------------------------------------------------- volume data-posix type storage/posix # POSIX FS translator option directory /opt/gluster/ds # Export this directory end-volume volume data-locks type features/posix-locks option mandatory-locks on subvolumes data-posix end-volume volume twin-data-locks type protocol/client option transport-type tcp/client option remote-host 1.1.1.2 option remote-port 6996 option transport-timeout 10 option remote-subvolume data-locks end-volume volume data-replicate type cluster/replicate subvolumes data-locks twin-data-locks end-volume volume data type performance/io-threads option thread-count 8 subvolumes data-replicate end-volume volume server type protocol/server option transport-type tcp/server # For TCP/IP transport option transport.socket.bind-address 0.0.0.0 option transport.socket.listen-port 6996 option auth.addr.data-locks.allow * # Allow access to "brick" volume option auth.addr.data.allow 127.0.0.1 # Allow access to "brick" volume subvolumes data end-volume -------------------------------------------------------------------------------- Server A client.vol -------------------------------------------------------------------------------- volume client type protocol/client option transport-type tcp/client # for TCP/IP transport option remote-host 127.0.0.1 # IP address of the remote brick option remote-port 6996 # default server port is 6996 option remote-subvolume data end-volume -------------------------------------------------------------------------------- Server B server.vol -------------------------------------------------------------------------------- volume data-posix type storage/posix # POSIX FS translator option directory /opt/gluster/ds # Export this directory end-volume volume data-locks type features/posix-locks option mandatory-locks on subvolumes data-posix end-volume volume twin-data-locks type protocol/client option transport-type tcp/client option remote-host 1.1.1.1 option remote-port 6996 option transport-timeout 10 option remote-subvolume data-locks end-volume volume data-replicate type cluster/replicate subvolumes data-locks twin-data-locks end-volume volume data type performance/io-threads option thread-count 8 subvolumes data-replicate end-volume volume server type protocol/server option transport-type tcp/server # For TCP/IP transport option transport.socket.bind-address 0.0.0.0 option transport.socket.listen-port 6996 option auth.addr.data-locks.allow * # Allow access to "brick" volume option auth.addr.data.allow 127.0.0.1 # Allow access to "brick" volume subvolumes data end-volume -------------------------------------------------------------------------------- Server B client.vol -------------------------------------------------------------------------------- volume client type protocol/client option transport-type tcp/client # for TCP/IP transport option remote-host 127.0.0.1 # IP address of the remote brick option remote-port 6996 # default server port is 6996 option remote-subvolume data end-volume Thanks for any help! Matthew Davis
Vikas Gorur
2009-Mar-12 07:42 UTC
[Gluster-users] Unexpected AFR self-heal behavior (file contents not synced).
2009/3/11 Matthew Davis <mdavis at helius.com> wrote: <snip> In your configuration, the order of subvolumes of replicate on A and B is not the same. It is crucial that this rule be followed. So the correct configuration would be: Server A: volume replicate subvolumes data-locks twin-data-locks Server B: volume replicate subvolumes twin-data-locks data-locks Vikas -- Engineer - Z Research http://gluster.com/