I just got started with glusterfs. I read the docs over the weekend and today created a simple setup: two servers exporting a brick and one client mounting them with AFR. I am seeing very poor write performance on a dd test, e.g.: time dd if=/dev/zero of=./local-file bs=8192 count=125000 presumably due to a very large number of write operations (because when I increase the blocksize to 64K, the performance increases by 2x). I enabled the writebehind translator but see no improvement. I then enabled a trace translator on both sides of the writebehind and seem to be seeing that write-behind is not batching any of the operations. Server vol file: volume posix type storage/posix option directory /mnt/glusterfsd-export end-volume volume locks type features/locks subvolumes posix end-volume volume brick type performance/io-threads option thread-count 8 subvolumes locks end-volume volume server type protocol/server option transport-type tcp option auth.addr.brick.allow * subvolumes brick end-volume Client vol file: volume remote1 type protocol/client option transport-type tcp option remote-host web-1 option remote-subvolume brick end-volume volume remote2 type protocol/client option transport-type tcp option remote-host web-2 option remote-subvolume brick end-volume volume replicate type cluster/replicate subvolumes remote1 remote2 end-volume volume trace-below type debug/trace subvolumes replicate end-volume volume writebehind type performance/write-behind option cache-size 1MB subvolumes trace-below end-volume volume trace-above type debug/trace subvolumes writebehind end-volume With this configuration, I re-ran by dd test but with only count=100. The log shows: [root at web-3 glusterfs-mount]# grep trace /var/log/glusterfs/mnt- glusterfs-mount.log | grep above | wc 245 3591 42117 [root at web-3 glusterfs-mount]# grep trace /var/log/glusterfs/mnt- glusterfs-mount.log | grep below | wc 252 3678 43095 So, there are as many writes to trace-below as trace-above. What am I not understanding? Thanks! Barry
Did you delete the output file after running your dd test? I saw significant improvement when I modified my dd test to delete the out file after each run. Ed -----Original Message----- From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Barry Jaspan Sent: Monday, June 29, 2009 2:49 PM To: gluster-users at gluster.org Subject: [Gluster-users] AFR, writebehind, and debug/trace I just got started with glusterfs. I read the docs over the weekend and today created a simple setup: two servers exporting a brick and one client mounting them with AFR. I am seeing very poor write performance on a dd test, e.g.: time dd if=/dev/zero of=./local-file bs=8192 count=125000 presumably due to a very large number of write operations (because when I increase the blocksize to 64K, the performance increases by 2x). I enabled the writebehind translator but see no improvement. I then enabled a trace translator on both sides of the writebehind and seem to be seeing that write-behind is not batching any of the operations. Server vol file: volume posix type storage/posix option directory /mnt/glusterfsd-export end-volume volume locks type features/locks subvolumes posix end-volume volume brick type performance/io-threads option thread-count 8 subvolumes locks end-volume volume server type protocol/server option transport-type tcp option auth.addr.brick.allow * subvolumes brick end-volume Client vol file: volume remote1 type protocol/client option transport-type tcp option remote-host web-1 option remote-subvolume brick end-volume volume remote2 type protocol/client option transport-type tcp option remote-host web-2 option remote-subvolume brick end-volume volume replicate type cluster/replicate subvolumes remote1 remote2 end-volume volume trace-below type debug/trace subvolumes replicate end-volume volume writebehind type performance/write-behind option cache-size 1MB subvolumes trace-below end-volume volume trace-above type debug/trace subvolumes writebehind end-volume With this configuration, I re-ran by dd test but with only count=100. The log shows: [root at web-3 glusterfs-mount]# grep trace /var/log/glusterfs/mnt- glusterfs-mount.log | grep above | wc 245 3591 42117 [root at web-3 glusterfs-mount]# grep trace /var/log/glusterfs/mnt- glusterfs-mount.log | grep below | wc 252 3678 43095 So, there are as many writes to trace-below as trace-above. What am I not understanding? Thanks! Barry _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Barry Jaspan wrote:> I just got started with glusterfs. I read the docs over the weekend > and today created a simple setup: two servers exporting a brick and > one client mounting them with AFR. I am seeing very poor write > performance on a dd test, e.g.: > > time dd if=/dev/zero of=./local-file bs=8192 count=125000 > > presumably due to a very large number of write operations (because > when I increase the blocksize to 64K, the performance increases by > 2x). I enabled the writebehind translator but see no improvement. I > then enabled a trace translator on both sides of the writebehind and > seem to be seeing that write-behind is not batching any of the > operations. >By batching, if you mean aggregation of smaller requests to be sent as one large request, then, no, write-behind does just that, it writes behind the actual write request. There is no write buffering going on at this point. We've plans of incorporating write buffering in io-cache for the 2.1 release. -Shehjar> Server vol file: > > volume posix type storage/posix option directory > /mnt/glusterfsd-export end-volume volume locks type features/locks > subvolumes posix end-volume volume brick type performance/io-threads > option thread-count 8 subvolumes locks end-volume volume server type > protocol/server option transport-type tcp option > auth.addr.brick.allow * subvolumes brick end-volume > > Client vol file: > > volume remote1 type protocol/client option transport-type tcp option > remote-host web-1 option remote-subvolume brick end-volume volume > remote2 type protocol/client option transport-type tcp option > remote-host web-2 option remote-subvolume brick end-volume volume > replicate type cluster/replicate subvolumes remote1 remote2 > end-volume volume trace-below type debug/trace subvolumes replicate > end-volume volume writebehind type performance/write-behind option > cache-size 1MB subvolumes trace-below end-volume volume trace-above > type debug/trace subvolumes writebehind end-volume > > With this configuration, I re-ran by dd test but with only > count=100. The log shows: > > [root at web-3 glusterfs-mount]# grep trace > /var/log/glusterfs/mnt-glusterfs-mount.log | grep above | wc 245 3591 > 42117 [root at web-3 glusterfs-mount]# grep trace > /var/log/glusterfs/mnt-glusterfs-mount.log | grep below | wc 252 3678 > 43095 > > So, there are as many writes to trace-below as trace-above. > > What am I not understanding? > > Thanks! > > Barry > > > > _______________________________________________ Gluster-users mailing > list Gluster-users at gluster.org > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users >
On Mon, Jun 29, 2009 at 23:49, Barry Jaspan<barry.jaspan at acquia.com> wrote:> I just got started with glusterfs. ?I read the docs over the weekend and > today created a simple setup: two servers exporting a brick and one client > mounting them with AFR. I am seeing very poor write performance on a dd > test, e.g.: > > time dd if=/dev/zero of=./local-file bs=8192 count=125000 > > presumably due to a very large number of write operations (because when I > increase the blocksize to 64K, the performance increases by 2x). ?I enabledI didn't want to enter these threads because I may sound a bit pessimistic, but here's my experience with glusterfs. I needed a simple NFS replacement at the moment (but I guess it's just the same for any application with the exception that there aren't alternatives). With the kernel fuse module _everything_ was dirt poor, basically useless. Replacing it with the glusterFS patched one improved performance around 5-fold. Still, I have experienced useless write performance below 64k block size (3MB/s in contrast to 60MB/s). I have found no solution, apart from _not_ using writeback which slowed it down to the half speed (around 2MB/s). Read performance is excellent. I am about to use iSCSI (linux at both ends, software components only), which seem not to impose that problem. I guess it's not glusterFS but FUSE, but I see no workaround for it. -- byte-byte, grin
----- "Barry Jaspan" <barry.jaspan at acquia.com> wrote:> I just got started with glusterfs. I read the docs over the weekend > > and today created a simple setup: two servers exporting a brick and > one client mounting them with AFR. I am seeing very poor write > performance on a dd test, e.g.: > > time dd if=/dev/zero of=./local-file bs=8192 count=125000Please try setting the "option cache-size 1MB" option in write-behind. As Shehjar said, write-behind does not aggregate requests, but merely sends a reply to the application before the reply from the server has come back. For reference, this is the kind of performance we get on our test cluster with the same configuration: http://dev.gluster.com/~vikas/dd.simple-afr-2.png Vikas -- Engineer - http://gluster.com/ A: Because it messes up the way people read text. Q: Why is a top-posting such a bad thing? --