[Apologies if this gets posted twice] Hi, I can't seem to get flush-behind to flush behind and I was wondering if anyone would be kind enough to lend some brain cells to help me see what I'm missing. I'm using glusterfs 2.0.1 on Debian/x86. I've got a "cluster/ replicate" volume replicated between two servers that are connected (over a VPN) over a slooooow (ADSL) link. Synchronous creation of large files over such a link is very slow, so I added the "performance/write-behind" layer with "option flush-behind on". When I write a file to the volume using cp I see the write()s returning implausibly quickly for the speed of the link, as expected, but the final close() still suffers a long delay as though I'd never specified flush-behind. The following strace shows what happens when a 100kB file gets copied into the gluster (which is mounted on /mnt/ tmp). strace prints timestamps at syscall entry. ===========quince:/mnt# strace -ttt cp ~/typically-sized-files/hundredkay /mnt/ tmp/100k 1246310229.679320 execve("/bin/cp", ["cp", "/root/typically-sized- files/hund"..., "/mnt/tmp/100k"], [/* 17 vars */]) = 0 [...] 1246310230.343767 open("/root/typically-sized-files/hundredkay", O_RDONLY|O_LARGEFILE) = 3 1246310230.343818 fstat64(3, {st_mode=S_IFREG|0644, st_size=102400, ...}) = 0 1246310230.343861 open("/mnt/tmp/100k", O_WRONLY|O_CREAT|O_EXCL| O_LARGEFILE, 0644) = 4 1246310230.563629 fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 1246310230.563694 read(3, "MZ\220\0\3\0\0\0\4\0\0\0\377\377\0\0\270\0 \0\0\0\0\0\0@\0\0\0\0\0\0\0\0"..., 8192) = 8192 1246310230.563758 write(4, "MZ\220\0\3\0\0\0\4\0\0\0\377\377\0\0\270\0 \0\0\0\0\0\0@\0\0\0\0\0\0\0\0"..., 8192) = 8192 1246310231.480604 read(3, "M\364_^d\211\r\0\0\0\0\311\303\213\3013\311 \211H\4\211H\10\211H\f\307@\20\1\0\0\0"..., 8192) = 8192 1246310231.480667 write(4, "M\364_^d\211\r\0\0\0\0\311\303\213\3013 \311\211H\4\211H\10\211H\f\307@\20\1\0\0\0"..., 8192) = 8192 1246310231.480940 read(3, "\377\377C;_\10|\360\213E\f\211u\0109p\10\17 \216\220\0\0\0\213E\f\213M\10\213@\f\213"..., 8192) = 8192 1246310231.480990 write(4, "\377\377C;_\10|\360\213E\f\211u\0109p\10 \17\216\220\0\0\0\213E\f\213M\10\213@\f\213"..., 8192) = 8192 [...] 1246310231.481945 read(3, "\312u\1\0\336u\1\0\352u\1\0\374u\1\0\16v\1 \0\36v\1\0,v\1\0:v\1\0H"..., 8192) = 8192 1246310231.481986 write(4, "\312u\1\0\336u\1\0\352u\1\0\374u\1\0\16v\1 \0\36v\1\0,v\1\0:v\1\0H"..., 8192) = 8192 1246310231.482063 read(3, "\377\271^\0\377\302v\0\322\247m\0\364\313 \226\0\377\300h\0\374\277h\0\372\304x\0\340\261p\0\341"..., 8192) = 4096 1246310231.482102 write(4, "\377\271^\0\377\302v\0\322\247m\0\364\313 \226\0\377\300h\0\374\277h\0\372\304x\0\340\261p\0\341"..., 4096) = 4096 1246310231.482174 close(4) = 0 1246310235.602280 close(3) = 0 [...] 1246310235.602419 exit_group(0) = ? =========== This delay at close (4.2 seconds here) gets longer as the file gets bigger - exactly as you'd expect if write()ing was asynchronous but close()ing/flush()ing was synchronous. Here's the (slightly more abbreviated) timings for a 1MB file: ==========quince:/mnt# strace -ttt cp ~/typically-sized-files/onemeg /mnt/tmp/ 1246315526.483531 execve("/bin/cp", ["cp", "/root/typically-sized- files/onem"..., "/mnt/tmp/"], [/* 17 vars */]) = 0 [...] 1246315527.034448 read(3, "MZ\220\0\3\0\0\0\4\0\0\0\377\377\0\0\270\0 \0\0\0\0\0\0@\0\0\0\0\0\0\0\0"..., 8192) = 8192 1246315527.034515 write(4, "MZ\220\0\3\0\0\0\4\0\0\0\377\377\0\0\270\0 \0\0\0\0\0\0@\0\0\0\0\0\0\0\0"..., 8192) = 8192 1246315527.500059 read(3, "M\364_^d\211\r\0\0\0\0\311\303\213\3013\311 \211H\4\211H\10\211H\f\307@\20\1\0\0\0"..., 8192) = 8192 1246315527.500113 write(4, "M\364_^d\211\r\0\0\0\0\311\303\213\3013 \311\211H\4\211H\10\211H\f\307@\20\1\0\0\0"..., 8192) = 8192 [...] 1246315527.522606 write(4, "\362\303\34:?/\5+\351\262\277\35\2413\5(eP \0\357\223\226w\373\311r4\325\347\215\30\r\270"..., 8192) = 8192 1246315527.522733 read(3, ""..., 8192) = 0 1246315527.522770 close(4) = 0 1246315563.780185 close(3) = 0 ========== That's 36.26 seconds or ~28kB/s, which is typical of my upload speeds, strongly suggesting glusterfs is copying the file before returning from close(). Each of the two slow-connected machines is a gluster server with a localhost gluster client. I'm using server-based replication (only out of desperation when client-based flush-behind didn't work). This latest config does the write-back on the server, but I've also tried doing it on the client, with the same problem. I'll include my most recent server-replication-based/server-write-delaying config because it's where those straces came from. Client config (identical both ends): =======volume loopback type protocol/client option transport-type tcp option remote-host 127.0.0.1 option remote-subvolume delayedwrite # name of the remote volume end-volume ======== Server config. I'm just giving one side because they differ only by the IP address and hostname. ========volume shared-raw type storage/posix # POSIX FS translator option directory /mnt/big/fileserving/gluster-test/shared # Export this directory end-volume volume quince-shared type features/locks subvolumes shared-raw end-volume volume remote type protocol/client option transport-type tcp option remote-host 10.8.0.3 # aka kumquat option remote-subvolume kumquat-shared # name of the remote volume end-volume volume replicated type cluster/replicate subvolumes quince-shared remote option read-subvolume quince-shared end-volume volume threads type performance/io-threads subvolumes replicated end-volume volume delayedwrite type performance/write-behind option flush-behind on option cache-size 4MB subvolumes threads end-volume volume server type protocol/server option transport-type tcp subvolumes delayedwrite quince-shared option auth.addr.delayedwrite.allow 127.0.0.1 # Allow access to replicated volume from just the local client option auth.addr.quince-shared.allow 10.8.0.3 # Allow access to lockified tree from kumquat end-volume ========= Logs show nothing unusual, and small files replicate between sites just fine with acceptable delay. I've unmounted/stopped/started/ remounted everything every time I've changed the configs, so these straces definitely belong to these configs. So obviously the question is: why am I seeing delays on close() even though I've got "option flush-behind on"? Is there something else I need to have in my config? What's the best way to debug this if I've done nothing obviously wrong? For what it's worth, a little more about my setup: got two offices full of secretaries, connected to t'Internet by ADSL links (Infiniband it certainly ain't!). The secretaries create new Word docs, read each other's docs and move files around, but never edit anything. One office is currently accessing the other over Samba over ADSL, which is predictably unusably slow. I'm trying to use GlusterFS to replicate the tree between the two offices to provide both fast local reads and some level of link death resilience, and I'm hoping to layer Samba on top either end. I'll resolve conflicts manually because they're so rare. I seriously considered Coda first, but Gluster just looks so much tidier and more lively and current as a project, and - except for this one problem - seems to do exactly what I want. Thanks in advance for any light shed, Zak