thr3ads.net - Gluster users - [Gluster-users] Problem with fs hang [Jun 2009]

If this information is useful, please help other people find it:
Share via:

Stephan von Krawczynski

2009-Jun-11 14:52 UTC

[Gluster-users] Problem with fs hang

Hello all,

I am evaluating glusterfs currently and therefore use a testbed consisting of
four (real) boxes, two configured as fileservers, two as clients, see configs
below.
All I have to do to make the fs hang on both clients (sooner or later) is to
run bonnie (a simple fs check program) constantly on the mounted tree.
I am using glusterfs 2.0.2 from the qa-release tree on a stock 2.6.29.4 kernel
with opensuse 11.1 as base.
The hang shows up within 12 hours runtime. If you kill the hanging glusterfs 
client process and re-mount everything works again - no restart of the box
necessary.
Any ideas?


test script:

#! /bin/bash

(
cd /test   # glusterfs mount point
while true; do
        bonnie
done
)


server-config (both equal):

volume posix
  type storage/posix
  option directory /p3
end-volume

volume locks
  type features/locks
  subvolumes posix
end-volume

volume p3
  type performance/io-threads
  option thread-count 8
  subvolumes locks
end-volume

volume server
  type protocol/server
  option transport-type tcp
  option auth.addr.p3.allow *
  subvolumes p3
end-volume


client config (both equal):

volume remote1
  type protocol/client
  option transport-type tcp
  option remote-host 192.168.0.101
  option remote-subvolume p3
end-volume

volume remote2
  type protocol/client
  option transport-type tcp
  option remote-host 192.168.0.102
  option remote-subvolume p3
end-volume

volume replicate
  type cluster/replicate
  option data-self-heal on
  option metadata-self-heal on
  option entry-self-heal on
  subvolumes remote1 remote2
end-volume

volume readahead
  type performance/read-ahead
  # option page-size 1MB     # unit in bytes
  option page-count 8       # cache per file  = (page-count x page-size)
  subvolumes replicate
end-volume


volume writebehind
  type performance/write-behind
  # option aggregate-size 1MB
  # option window-size 1MB
  option cache-size 128MB
  option flush-behind on
  subvolumes readahead
end-volume

volume cache
  type performance/io-cache
  option cache-size 512MB
  subvolumes writebehind
end-volume


Btw: the commented options are not recognised according to the logs ...

-- 
Regards,
Stephan

Matt M

2009-Jun-11 16:00 UTC

head link

[Gluster-users] Problem with fs hang

Stephan von Krawczynski wrote:> Hello all,
> 
> I am evaluating glusterfs currently and therefore use a testbed consisting
of
> four (real) boxes, two configured as fileservers, two as clients, see
configs
> below.
> All I have to do to make the fs hang on both clients (sooner or later) is
to
> run bonnie (a simple fs check program) constantly on the mounted tree.
I'm seeing similar problems with 2.0.1 -- any log entries?  I get a lot 
of these in glusterfs.log when mine hangs:

[2009-06-11 05:10:51] E [client-protocol.c:292:call_bail] zircon: 
bailing out frame STATFS(15) frame sent = 2009-06-11 04:40:50. 
frame-timeout = 1800
[2009-06-11 05:10:52] W 
[client-protocol.c:5869:protocol_client_interpret] zircon: no frame for 
callid=1913585 type=4 op=29
[2009-06-11 05:10:52] W 
[client-protocol.c:5869:protocol_client_interpret] zircon: no frame for 
callid=2706745 type=4 op=40

My setup is very close to yours -- using replicate, iocache, readahead, 
and writebehind on the client and iothreads on the servers.  The thread 
below sounds like my problem, but I don't have autoscaling (explicitly) 
turned on:

http://www.mail-archive.com/gluster-devel at nongnu.org/msg06140.html

...so I'm wondering if it could be something else.

-Matt

Gluster users - Jun 2009 - Problem with fs hang

[Gluster-users] Problem with fs hang

[Gluster-users] Problem with fs hang