Hello all,
I am evaluating glusterfs currently and therefore use a testbed consisting of
four (real) boxes, two configured as fileservers, two as clients, see configs
below.
All I have to do to make the fs hang on both clients (sooner or later) is to
run bonnie (a simple fs check program) constantly on the mounted tree.
I am using glusterfs 2.0.2 from the qa-release tree on a stock 2.6.29.4 kernel
with opensuse 11.1 as base.
The hang shows up within 12 hours runtime. If you kill the hanging glusterfs
client process and re-mount everything works again - no restart of the box
necessary.
Any ideas?
test script:
#! /bin/bash
(
cd /test # glusterfs mount point
while true; do
bonnie
done
)
server-config (both equal):
volume posix
type storage/posix
option directory /p3
end-volume
volume locks
type features/locks
subvolumes posix
end-volume
volume p3
type performance/io-threads
option thread-count 8
subvolumes locks
end-volume
volume server
type protocol/server
option transport-type tcp
option auth.addr.p3.allow *
subvolumes p3
end-volume
client config (both equal):
volume remote1
type protocol/client
option transport-type tcp
option remote-host 192.168.0.101
option remote-subvolume p3
end-volume
volume remote2
type protocol/client
option transport-type tcp
option remote-host 192.168.0.102
option remote-subvolume p3
end-volume
volume replicate
type cluster/replicate
option data-self-heal on
option metadata-self-heal on
option entry-self-heal on
subvolumes remote1 remote2
end-volume
volume readahead
type performance/read-ahead
# option page-size 1MB # unit in bytes
option page-count 8 # cache per file = (page-count x page-size)
subvolumes replicate
end-volume
volume writebehind
type performance/write-behind
# option aggregate-size 1MB
# option window-size 1MB
option cache-size 128MB
option flush-behind on
subvolumes readahead
end-volume
volume cache
type performance/io-cache
option cache-size 512MB
subvolumes writebehind
end-volume
Btw: the commented options are not recognised according to the logs ...
--
Regards,
Stephan
Stephan von Krawczynski wrote:> Hello all, > > I am evaluating glusterfs currently and therefore use a testbed consisting of > four (real) boxes, two configured as fileservers, two as clients, see configs > below. > All I have to do to make the fs hang on both clients (sooner or later) is to > run bonnie (a simple fs check program) constantly on the mounted tree.I'm seeing similar problems with 2.0.1 -- any log entries? I get a lot of these in glusterfs.log when mine hangs: [2009-06-11 05:10:51] E [client-protocol.c:292:call_bail] zircon: bailing out frame STATFS(15) frame sent = 2009-06-11 04:40:50. frame-timeout = 1800 [2009-06-11 05:10:52] W [client-protocol.c:5869:protocol_client_interpret] zircon: no frame for callid=1913585 type=4 op=29 [2009-06-11 05:10:52] W [client-protocol.c:5869:protocol_client_interpret] zircon: no frame for callid=2706745 type=4 op=40 My setup is very close to yours -- using replicate, iocache, readahead, and writebehind on the client and iothreads on the servers. The thread below sounds like my problem, but I don't have autoscaling (explicitly) turned on: http://www.mail-archive.com/gluster-devel at nongnu.org/msg06140.html ...so I'm wondering if it could be something else. -Matt