I just got started with glusterfs. I read the docs over the weekend
and today created a simple setup: two servers exporting a brick and
one client mounting them with AFR. I am seeing very poor write
performance on a dd test, e.g.:
time dd if=/dev/zero of=./local-file bs=8192 count=125000
presumably due to a very large number of write operations (because
when I increase the blocksize to 64K, the performance increases by
2x). I enabled the writebehind translator but see no improvement. I
then enabled a trace translator on both sides of the writebehind and
seem to be seeing that write-behind is not batching any of the
operations.
Server vol file:
volume posix
type storage/posix
option directory /mnt/glusterfsd-export
end-volume
volume locks
type features/locks
subvolumes posix
end-volume
volume brick
type performance/io-threads
option thread-count 8
subvolumes locks
end-volume
volume server
type protocol/server
option transport-type tcp
option auth.addr.brick.allow *
subvolumes brick
end-volume
Client vol file:
volume remote1
type protocol/client
option transport-type tcp
option remote-host web-1
option remote-subvolume brick
end-volume
volume remote2
type protocol/client
option transport-type tcp
option remote-host web-2
option remote-subvolume brick
end-volume
volume replicate
type cluster/replicate
subvolumes remote1 remote2
end-volume
volume trace-below
type debug/trace
subvolumes replicate
end-volume
volume writebehind
type performance/write-behind
option cache-size 1MB
subvolumes trace-below
end-volume
volume trace-above
type debug/trace
subvolumes writebehind
end-volume
With this configuration, I re-ran by dd test but with only
count=100. The log shows:
[root at web-3 glusterfs-mount]# grep trace /var/log/glusterfs/mnt-
glusterfs-mount.log | grep above | wc
245 3591 42117
[root at web-3 glusterfs-mount]# grep trace /var/log/glusterfs/mnt-
glusterfs-mount.log | grep below | wc
252 3678 43095
So, there are as many writes to trace-below as trace-above.
What am I not understanding?
Thanks!
Barry
Did you delete the output file after running your dd test? I saw significant
improvement when I modified my dd test to delete the out file after each run.
Ed
-----Original Message-----
From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at
gluster.org] On Behalf Of Barry Jaspan
Sent: Monday, June 29, 2009 2:49 PM
To: gluster-users at gluster.org
Subject: [Gluster-users] AFR, writebehind, and debug/trace
I just got started with glusterfs. I read the docs over the weekend
and today created a simple setup: two servers exporting a brick and
one client mounting them with AFR. I am seeing very poor write
performance on a dd test, e.g.:
time dd if=/dev/zero of=./local-file bs=8192 count=125000
presumably due to a very large number of write operations (because
when I increase the blocksize to 64K, the performance increases by
2x). I enabled the writebehind translator but see no improvement. I
then enabled a trace translator on both sides of the writebehind and
seem to be seeing that write-behind is not batching any of the
operations.
Server vol file:
volume posix
type storage/posix
option directory /mnt/glusterfsd-export
end-volume
volume locks
type features/locks
subvolumes posix
end-volume
volume brick
type performance/io-threads
option thread-count 8
subvolumes locks
end-volume
volume server
type protocol/server
option transport-type tcp
option auth.addr.brick.allow *
subvolumes brick
end-volume
Client vol file:
volume remote1
type protocol/client
option transport-type tcp
option remote-host web-1
option remote-subvolume brick
end-volume
volume remote2
type protocol/client
option transport-type tcp
option remote-host web-2
option remote-subvolume brick
end-volume
volume replicate
type cluster/replicate
subvolumes remote1 remote2
end-volume
volume trace-below
type debug/trace
subvolumes replicate
end-volume
volume writebehind
type performance/write-behind
option cache-size 1MB
subvolumes trace-below
end-volume
volume trace-above
type debug/trace
subvolumes writebehind
end-volume
With this configuration, I re-ran by dd test but with only
count=100. The log shows:
[root at web-3 glusterfs-mount]# grep trace /var/log/glusterfs/mnt-
glusterfs-mount.log | grep above | wc
245 3591 42117
[root at web-3 glusterfs-mount]# grep trace /var/log/glusterfs/mnt-
glusterfs-mount.log | grep below | wc
252 3678 43095
So, there are as many writes to trace-below as trace-above.
What am I not understanding?
Thanks!
Barry
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Barry Jaspan wrote:> I just got started with glusterfs. I read the docs over the weekend > and today created a simple setup: two servers exporting a brick and > one client mounting them with AFR. I am seeing very poor write > performance on a dd test, e.g.: > > time dd if=/dev/zero of=./local-file bs=8192 count=125000 > > presumably due to a very large number of write operations (because > when I increase the blocksize to 64K, the performance increases by > 2x). I enabled the writebehind translator but see no improvement. I > then enabled a trace translator on both sides of the writebehind and > seem to be seeing that write-behind is not batching any of the > operations. >By batching, if you mean aggregation of smaller requests to be sent as one large request, then, no, write-behind does just that, it writes behind the actual write request. There is no write buffering going on at this point. We've plans of incorporating write buffering in io-cache for the 2.1 release. -Shehjar> Server vol file: > > volume posix type storage/posix option directory > /mnt/glusterfsd-export end-volume volume locks type features/locks > subvolumes posix end-volume volume brick type performance/io-threads > option thread-count 8 subvolumes locks end-volume volume server type > protocol/server option transport-type tcp option > auth.addr.brick.allow * subvolumes brick end-volume > > Client vol file: > > volume remote1 type protocol/client option transport-type tcp option > remote-host web-1 option remote-subvolume brick end-volume volume > remote2 type protocol/client option transport-type tcp option > remote-host web-2 option remote-subvolume brick end-volume volume > replicate type cluster/replicate subvolumes remote1 remote2 > end-volume volume trace-below type debug/trace subvolumes replicate > end-volume volume writebehind type performance/write-behind option > cache-size 1MB subvolumes trace-below end-volume volume trace-above > type debug/trace subvolumes writebehind end-volume > > With this configuration, I re-ran by dd test but with only > count=100. The log shows: > > [root at web-3 glusterfs-mount]# grep trace > /var/log/glusterfs/mnt-glusterfs-mount.log | grep above | wc 245 3591 > 42117 [root at web-3 glusterfs-mount]# grep trace > /var/log/glusterfs/mnt-glusterfs-mount.log | grep below | wc 252 3678 > 43095 > > So, there are as many writes to trace-below as trace-above. > > What am I not understanding? > > Thanks! > > Barry > > > > _______________________________________________ Gluster-users mailing > list Gluster-users at gluster.org > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users >
On Mon, Jun 29, 2009 at 23:49, Barry Jaspan<barry.jaspan at acquia.com> wrote:> I just got started with glusterfs. ?I read the docs over the weekend and > today created a simple setup: two servers exporting a brick and one client > mounting them with AFR. I am seeing very poor write performance on a dd > test, e.g.: > > time dd if=/dev/zero of=./local-file bs=8192 count=125000 > > presumably due to a very large number of write operations (because when I > increase the blocksize to 64K, the performance increases by 2x). ?I enabledI didn't want to enter these threads because I may sound a bit pessimistic, but here's my experience with glusterfs. I needed a simple NFS replacement at the moment (but I guess it's just the same for any application with the exception that there aren't alternatives). With the kernel fuse module _everything_ was dirt poor, basically useless. Replacing it with the glusterFS patched one improved performance around 5-fold. Still, I have experienced useless write performance below 64k block size (3MB/s in contrast to 60MB/s). I have found no solution, apart from _not_ using writeback which slowed it down to the half speed (around 2MB/s). Read performance is excellent. I am about to use iSCSI (linux at both ends, software components only), which seem not to impose that problem. I guess it's not glusterFS but FUSE, but I see no workaround for it. -- byte-byte, grin
----- "Barry Jaspan" <barry.jaspan at acquia.com> wrote:> I just got started with glusterfs. I read the docs over the weekend > > and today created a simple setup: two servers exporting a brick and > one client mounting them with AFR. I am seeing very poor write > performance on a dd test, e.g.: > > time dd if=/dev/zero of=./local-file bs=8192 count=125000Please try setting the "option cache-size 1MB" option in write-behind. As Shehjar said, write-behind does not aggregate requests, but merely sends a reply to the application before the reply from the server has come back. For reference, this is the kind of performance we get on our test cluster with the same configuration: http://dev.gluster.com/~vikas/dd.simple-afr-2.png Vikas -- Engineer - http://gluster.com/ A: Because it messes up the way people read text. Q: Why is a top-posting such a bad thing? --