thr3ads.net - Gluster users - [Gluster-users] AFR, writebehind, and debug/trace [Jun 2009]

If this information is useful, please help other people find it:
Share via:

Barry Jaspan

2009-Jun-29 21:49 UTC

[Gluster-users] AFR, writebehind, and debug/trace

I just got started with glusterfs.  I read the docs over the weekend  
and today created a simple setup: two servers exporting a brick and  
one client mounting them with AFR. I am seeing very poor write  
performance on a dd test, e.g.:

time dd if=/dev/zero of=./local-file bs=8192 count=125000

presumably due to a very large number of write operations (because  
when I increase the blocksize to 64K, the performance increases by  
2x).  I enabled the writebehind translator but see no improvement.  I  
then enabled a trace translator on both sides of the writebehind and  
seem to be seeing that write-behind is not batching any of the  
operations.

Server vol file:

volume posix
   type storage/posix
   option directory /mnt/glusterfsd-export
end-volume
volume locks
   type features/locks
   subvolumes posix
end-volume
volume brick
   type performance/io-threads
   option thread-count 8
   subvolumes locks
end-volume
volume server
   type protocol/server
   option transport-type tcp
   option auth.addr.brick.allow *
   subvolumes brick
end-volume

Client vol file:

volume remote1
   type protocol/client
   option transport-type tcp
   option remote-host web-1
   option remote-subvolume brick
end-volume
volume remote2
   type protocol/client
   option transport-type tcp
   option remote-host web-2
   option remote-subvolume brick
end-volume
volume replicate
   type cluster/replicate
   subvolumes remote1 remote2
end-volume
volume trace-below
   type debug/trace
   subvolumes replicate
end-volume
volume writebehind
   type performance/write-behind
   option cache-size 1MB
   subvolumes trace-below
end-volume
volume trace-above
   type debug/trace
   subvolumes writebehind
end-volume

With this configuration,  I re-ran by dd test but with only  
count=100.  The log shows:

[root at web-3 glusterfs-mount]# grep trace /var/log/glusterfs/mnt- 
glusterfs-mount.log | grep above | wc
     245    3591   42117
[root at web-3 glusterfs-mount]# grep trace /var/log/glusterfs/mnt- 
glusterfs-mount.log | grep below | wc
     252    3678   43095

So, there are as many writes to trace-below as trace-above.

What am I not understanding?

Thanks!

Barry

Edmond Lo

2009-Jun-29 21:59 UTC

head link

[Gluster-users] AFR, writebehind, and debug/trace

Did you delete the output file after running your dd test? I saw significant
improvement when I modified my dd test to delete the out file after each run.

Ed

-----Original Message-----
From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at
gluster.org] On Behalf Of Barry Jaspan
Sent: Monday, June 29, 2009 2:49 PM
To: gluster-users at gluster.org
Subject: [Gluster-users] AFR, writebehind, and debug/trace

I just got started with glusterfs.  I read the docs over the weekend  
and today created a simple setup: two servers exporting a brick and  
one client mounting them with AFR. I am seeing very poor write  
performance on a dd test, e.g.:

time dd if=/dev/zero of=./local-file bs=8192 count=125000

presumably due to a very large number of write operations (because  
when I increase the blocksize to 64K, the performance increases by  
2x).  I enabled the writebehind translator but see no improvement.  I  
then enabled a trace translator on both sides of the writebehind and  
seem to be seeing that write-behind is not batching any of the  
operations.

Server vol file:

volume posix
   type storage/posix
   option directory /mnt/glusterfsd-export
end-volume
volume locks
   type features/locks
   subvolumes posix
end-volume
volume brick
   type performance/io-threads
   option thread-count 8
   subvolumes locks
end-volume
volume server
   type protocol/server
   option transport-type tcp
   option auth.addr.brick.allow *
   subvolumes brick
end-volume

Client vol file:

volume remote1
   type protocol/client
   option transport-type tcp
   option remote-host web-1
   option remote-subvolume brick
end-volume
volume remote2
   type protocol/client
   option transport-type tcp
   option remote-host web-2
   option remote-subvolume brick
end-volume
volume replicate
   type cluster/replicate
   subvolumes remote1 remote2
end-volume
volume trace-below
   type debug/trace
   subvolumes replicate
end-volume
volume writebehind
   type performance/write-behind
   option cache-size 1MB
   subvolumes trace-below
end-volume
volume trace-above
   type debug/trace
   subvolumes writebehind
end-volume

With this configuration,  I re-ran by dd test but with only  
count=100.  The log shows:

[root at web-3 glusterfs-mount]# grep trace /var/log/glusterfs/mnt- 
glusterfs-mount.log | grep above | wc
     245    3591   42117
[root at web-3 glusterfs-mount]# grep trace /var/log/glusterfs/mnt- 
glusterfs-mount.log | grep below | wc
     252    3678   43095

So, there are as many writes to trace-below as trace-above.

What am I not understanding?

Thanks!

Barry

_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Shehjar Tikoo

2009-Jun-30 06:08 UTC

head link

[Gluster-users] AFR, writebehind, and debug/trace

Barry Jaspan wrote:> I just got started with glusterfs.  I read the docs over the weekend 
> and today created a simple setup: two servers exporting a brick and 
> one client mounting them with AFR. I am seeing very poor write 
> performance on a dd test, e.g.:
> 
> time dd if=/dev/zero of=./local-file bs=8192 count=125000
> 
> presumably due to a very large number of write operations (because 
> when I increase the blocksize to 64K, the performance increases by 
> 2x).  I enabled the writebehind translator but see no improvement.  I
>  then enabled a trace translator on both sides of the writebehind and
>  seem to be seeing that write-behind is not batching any of the 
> operations.
> 
By batching, if you mean aggregation of smaller requests to be sent as
one large request, then, no, write-behind does just that, it writes
  behind the actual write request. There is no write buffering going on
at this point. We've plans of incorporating write buffering in io-cache
for the 2.1 release.

-Shehjar

> Server vol file:
> 
> volume posix type storage/posix option directory 
> /mnt/glusterfsd-export end-volume volume locks type features/locks 
> subvolumes posix end-volume volume brick type performance/io-threads
>  option thread-count 8 subvolumes locks end-volume volume server type
>  protocol/server option transport-type tcp option 
> auth.addr.brick.allow * subvolumes brick end-volume
> 
> Client vol file:
> 
> volume remote1 type protocol/client option transport-type tcp option 
> remote-host web-1 option remote-subvolume brick end-volume volume 
> remote2 type protocol/client option transport-type tcp option 
> remote-host web-2 option remote-subvolume brick end-volume volume 
> replicate type cluster/replicate subvolumes remote1 remote2 
> end-volume volume trace-below type debug/trace subvolumes replicate 
> end-volume volume writebehind type performance/write-behind option 
> cache-size 1MB subvolumes trace-below end-volume volume trace-above 
> type debug/trace subvolumes writebehind end-volume
> 
> With this configuration,  I re-ran by dd test but with only 
> count=100. The log shows:
> 
> [root at web-3 glusterfs-mount]# grep trace 
> /var/log/glusterfs/mnt-glusterfs-mount.log | grep above | wc 245 3591
> 42117 [root at web-3 glusterfs-mount]# grep trace 
> /var/log/glusterfs/mnt-glusterfs-mount.log | grep below | wc 252 3678
> 43095
> 
> So, there are as many writes to trace-below as trace-above.
> 
> What am I not understanding?
> 
> Thanks!
> 
> Barry
> 
> 
> 
> _______________________________________________ Gluster-users mailing
>  list Gluster-users at gluster.org 
> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>

Peter Gervai

2009-Jun-30 06:37 UTC

head link

[Gluster-users] AFR, writebehind, and debug/trace

On Mon, Jun 29, 2009 at 23:49, Barry Jaspan<barry.jaspan at acquia.com>
wrote:> I just got started with glusterfs. ?I read the docs over the weekend and
> today created a simple setup: two servers exporting a brick and one client
> mounting them with AFR. I am seeing very poor write performance on a dd
> test, e.g.:
>
> time dd if=/dev/zero of=./local-file bs=8192 count=125000
>
> presumably due to a very large number of write operations (because when I
> increase the blocksize to 64K, the performance increases by 2x). ?I enabled
I didn't want to enter these threads because I may sound a bit
pessimistic, but here's my experience with glusterfs.

I needed a simple NFS replacement at the moment (but I guess it's just
the same for any application with the exception that there aren't
alternatives).

With the kernel fuse module _everything_ was dirt poor, basically
useless. Replacing it with the glusterFS patched one improved
performance around 5-fold. Still, I have experienced useless write
performance below 64k block size (3MB/s in contrast to 60MB/s). I have
found no solution, apart from _not_ using writeback which slowed it
down to the half speed (around 2MB/s).

Read performance is excellent.

I am about to use iSCSI (linux at both ends, software components
only), which seem not to impose that problem.

I guess it's not glusterFS but FUSE, but I see no workaround for it.

-- 
 byte-byte,
    grin

Vikas Gorur

2009-Jul-02 09:20 UTC

head link

[Gluster-users] AFR, writebehind, and debug/trace

----- "Barry Jaspan" <barry.jaspan at acquia.com> wrote:
> I just got started with glusterfs.  I read the docs over the weekend 
> 
> and today created a simple setup: two servers exporting a brick and  
> one client mounting them with AFR. I am seeing very poor write  
> performance on a dd test, e.g.:
> 
> time dd if=/dev/zero of=./local-file bs=8192 count=125000
Please try setting the "option cache-size 1MB" option in write-behind.
As
Shehjar said, write-behind does not aggregate requests, but merely sends
a reply to the application before the reply from the server has come back.

For reference, this is the kind of performance we get on our test cluster
with the same configuration:

http://dev.gluster.com/~vikas/dd.simple-afr-2.png

Vikas
-- 
Engineer - http://gluster.com/

A: Because it messes up the way people read text.
Q: Why is a top-posting such a bad thing?
--

Gluster users - Jun 2009 - AFR, writebehind, and debug/trace

[Gluster-users] AFR, writebehind, and debug/trace

[Gluster-users] AFR, writebehind, and debug/trace

[Gluster-users] AFR, writebehind, and debug/trace

[Gluster-users] AFR, writebehind, and debug/trace

[Gluster-users] AFR, writebehind, and debug/trace