Hello all, I am evaluating glusterfs currently and therefore use a testbed consisting of four (real) boxes, two configured as fileservers, two as clients, see configs below. All I have to do to make the fs hang on both clients (sooner or later) is to run bonnie (a simple fs check program) constantly on the mounted tree. I am using glusterfs 2.0.2 from the qa-release tree on a stock 2.6.29.4 kernel with opensuse 11.1 as base. The hang shows up within 12 hours runtime. If you kill the hanging glusterfs client process and re-mount everything works again - no restart of the box necessary. Any ideas? test script: #! /bin/bash ( cd /test # glusterfs mount point while true; do bonnie done ) server-config (both equal): volume posix type storage/posix option directory /p3 end-volume volume locks type features/locks subvolumes posix end-volume volume p3 type performance/io-threads option thread-count 8 subvolumes locks end-volume volume server type protocol/server option transport-type tcp option auth.addr.p3.allow * subvolumes p3 end-volume client config (both equal): volume remote1 type protocol/client option transport-type tcp option remote-host 192.168.0.101 option remote-subvolume p3 end-volume volume remote2 type protocol/client option transport-type tcp option remote-host 192.168.0.102 option remote-subvolume p3 end-volume volume replicate type cluster/replicate option data-self-heal on option metadata-self-heal on option entry-self-heal on subvolumes remote1 remote2 end-volume volume readahead type performance/read-ahead # option page-size 1MB # unit in bytes option page-count 8 # cache per file = (page-count x page-size) subvolumes replicate end-volume volume writebehind type performance/write-behind # option aggregate-size 1MB # option window-size 1MB option cache-size 128MB option flush-behind on subvolumes readahead end-volume volume cache type performance/io-cache option cache-size 512MB subvolumes writebehind end-volume Btw: the commented options are not recognised according to the logs ... -- Regards, Stephan
Stephan von Krawczynski wrote:> Hello all, > > I am evaluating glusterfs currently and therefore use a testbed consisting of > four (real) boxes, two configured as fileservers, two as clients, see configs > below. > All I have to do to make the fs hang on both clients (sooner or later) is to > run bonnie (a simple fs check program) constantly on the mounted tree.I'm seeing similar problems with 2.0.1 -- any log entries? I get a lot of these in glusterfs.log when mine hangs: [2009-06-11 05:10:51] E [client-protocol.c:292:call_bail] zircon: bailing out frame STATFS(15) frame sent = 2009-06-11 04:40:50. frame-timeout = 1800 [2009-06-11 05:10:52] W [client-protocol.c:5869:protocol_client_interpret] zircon: no frame for callid=1913585 type=4 op=29 [2009-06-11 05:10:52] W [client-protocol.c:5869:protocol_client_interpret] zircon: no frame for callid=2706745 type=4 op=40 My setup is very close to yours -- using replicate, iocache, readahead, and writebehind on the client and iothreads on the servers. The thread below sounds like my problem, but I don't have autoscaling (explicitly) turned on: http://www.mail-archive.com/gluster-devel at nongnu.org/msg06140.html ...so I'm wondering if it could be something else. -Matt