thr3ads.net - Gluster users - [Gluster-users] Glusterfs-2 locks/hangs on EC2/vtun setup [May 2009]

If this information is useful, please help other people find it:
Share via:

Simon Detheridge

2009-May-13 14:20 UTC

[Gluster-users] Glusterfs-2 locks/hangs on EC2/vtun setup

Hi all,

I'm trying to get a glusterfs cluster working inside Amazon's EC2.
I'm using the official ubuntu 8.10 images.

I've compiled glusterfs-2, but I'm using the in-kernel fuse module as
the instances run 2.6.27-3, and the fuse module from glusterfs won't compile
against something that recent.

For my test setup I'm trying to get AFR working over two servers, with
another server as a client. All 3 servers have the glusterfs volume mounted.

After a few hours, the mounted volume on all three servers locks. ls'ing a
directory on the volume, or typing "df -h" hangs, and won't even
'kill -9'. I have to umount --force the glusterfs volume to get ls or df
to terminate.

The images communicate with each other over vtun-based tunnels, which I've
set up to provide a predictable IP addressing system between the nodes. (IPs
assigned by amazon are random.)

The logs don't show anything useful. The last thing it tells me about is the
handshake that took place a few hours ago.

I disabled the performance translators on the clients, but forgot to do so on
the server, so I'm currently running the test again with iothreads disabled
on the server, and also "mount -o log-level=DEBUG" on the client.

The volumes are not under heavy load at all, when they fail. All that's
happening to them is a script is running every 30 seconds on the client that
isn't running as a storage node, and does the following things:
* Writes a random value to a randomly-named file on the locally mounted volume
* Connects via SSH to one of the storage nodes, and reads the file from the
locally mounted volume
* Complains if the contents of the file are different
* Removes the file
* Repeats for the other node

In order to remount the volume after failure, I have to umount --force, and then
manually kill the glusterfs process. Otherwise the connection just hangs again
as soon as I remount.

On each storage node, my glusterfs-client.vol looks like this:

#------------------
volume web_remote_1
  type protocol/client
  option transport-type tcp
  option remote-host 192.168.172.10
  option remote-subvolume web_brick
end-volume

volume web_remote_2
  type protocol/client
  option transport-type tcp
  option remote-host 192.168.172.11
  option remote-subvolume web_brick
end-volume

volume web_replicate
  type cluster/replicate
  subvolumes web_remote_1 web_remote_2
end-volume
#------------------

On the servers, my glusterfs-server.vol looks like this:

#------------------
volume web
  type storage/posix
  option directory /var/glusterfs/web
end-volume

volume web_locks
  type features/locks
  subvolumes web
end-volume

volume web_brick
  type performance/io-threads
  option autoscaling on
  subvolumes web_locks
end-volume

volume web_server
  type protocol/server
  option transport-type tcp/server
  option client-volume-filename /etc/glusterfs/glusterfs-client-web.vol
  subvolumes web_brick
  option auth.addr.web_brick.allow *
end-volume
#------------------

Does anyone have any ideas why this happens?

Thanks,
Simon

-- 
Simon Detheridge - CTO, Widgit Software
26 Queen Street, Cubbington, CV32 7NA - Tel: +44 (0)1926 333680

Maybe Matching Threads

Search for more seemingly similar threads

Gluster users - May 2009 - Glusterfs-2 locks/hangs on EC2/vtun setup

[Gluster-users] Glusterfs-2 locks/hangs on EC2/vtun setup

Maybe Matching Threads

Wisdom of the Ancients