thr3ads.net - Gluster users - [Gluster-users] Gluster 1.3.10 Performance Issues [Aug 2008]

If this information is useful, please help other people find it:
Share via:

Chris Davies

2008-Aug-06 18:48 UTC

[Gluster-users] Gluster 1.3.10 Performance Issues

OS: Debian Linux/4.1, 64bit build
Hardware: quad core xeon x3220, 8gb RAM, dual 7200RPM 1000gb WD Hard  
Drives, 750gb raid 1 partition set as /gfsvol to be exported, dual  
gigE, juniper ex3200 switch

Fuse libraries: fuse-2.7.3glfs10
Gluster: glusterfs-1.3.10

Running bonnie++ on both machines results in almost identical numbers,  
eth1 is reserved wholly for server to server communications.  Right  
now, the only load on these machines comes from my testbed.  There are  
four tests that give a reasonable indicator of performance.

* loading a wordpress blog and looking at the line:
<!-- 24 queries. 0.634 seconds. -->
* dd if=/dev/zero of=/gfs/test/out bs=1M count=512
* time tar xjf /gfs/test/linux-2.6.26.1.tar.bz2
* /usr/sbin/bonnie++ /gfs/test/

On the wordpress test, .3 seconds is typical.  On various gluster  
configurations I've received between .411 seconds (server side afr  
config below) and 1.2 seconds with some of the example  
configurations.  Currently, my clientside AFR config comes in at .5xx  
seconds rather consistently.

The second test on the clientside AFR results in 536870912 bytes (537  
MB) copied, 4.65395 s, 115 MB/s

The third test is unpacking a kernel which has ranged from 28 seconds  
using the Serverside AFR to 6+ minutes on some configurations.   
Currently the clientside AFR config comes in at about 17 minutes.

The fourth test is a run of bonnie++ which varies from 36 minutes on  
the serverside AFR to the 80 minute run on the clientside AFR config.

Current test environment is using both servers as clients & servers --  
if I can get reasonable performance, the existing machines will become  
clients and the servers will be split to their own platform, so, I  
want to make sure I am using tcp for connections to give as close to a  
real world deployment as possible.  This means I cannot run a client- 
only config.

Baseline Wordpress returns .311-.399 seconds
Baseline dd 536870912 bytes (537 MB) copied, 0.489522 s, 1.1 GB/s
Baseline tar xjf of the kernel, real	0m12.164s
Baseline Config bonnie++ run on the raid 1 partition: (echo data | 
bon_csv2txt for the text reporting)

c1ws1,16G, 
66470,97,93198,16,42430,6,60253,86,97153,7,381.3,0,16,7534,37,+++++,++ 
+,5957,23,7320,34,+++++,+++,4667,21

So far, the best performance I could manage was Server Side AFR with  
writebehind/readahead on the server, with aggregate-size set to 0mb,  
and the client side running writebehind/readahead.  That resulted in:

c1ws2,16G, 
37636,50,76855,3,17429,2,60376,76,87653,3,158.6,0,16,1741,3,9683,6,2591,3,2030,3,9790,5,2369,3

It was suggested in IRC that clientside AFR would be faster and more  
reliable, however, I've ended up with the following as the best  
results from multiple attempts:

c1ws1,16G, 
46041,58,76811,2,4603,0,59140,76,86103,3,132.4,0,16,1069,2,4795,2,1308,2,1045,2,5209,2,1246,2

The bonnie++ run from the serverside AFR that resulted in the best  
results I've received to date took 34 minutes.  The latest clientside  
AFR bonnie run took 80 minutes.  Based on the website, I would expect  
to see better performance than drbd/GFS, but, so far that hasn't been  
the case.

Its been suggested that I use unify-rr-afr.  In my current setup, it  
seems that to do that, I would need to break my raid set which is my  
next step in debugging this.  Rather than use Raid 1 on the server, I  
would have 2 bricks on each server which would allow the use of unify  
and the rr scheduler.

glusterfs-1.4.0qa32 results in
[Wed Aug 06 02:01:44 2008] [notice] child pid 14025 exit signal Bus  
error (7)
[Wed Aug 06 02:01:44 2008] [notice] child pid 14037 exit signal Bus  
error (7)

when apache (not mod_gluster) tries to serve files off the glusterfs  
partition.

The main issue I'm having right now is file creation speed.  I realize  
that to create a file I need to do two network ops for each file  
created, but, it seems that something is horribly wrong in my  
configuration from the results in untarring the kernel.

I've tried moving the performance translators around, but, some don't  
seem to make much difference on the server side, and the ones that  
appear to make some difference client side, don't seem to help the  
file creation issue.

On a side note, zresearch.com, I emailed through your contact form and  
haven't heard back -- please provide a quote for generating the  
configuration and contact me offlist.

===/etc/gluster/gluster-server.vol
volume posix
     type storage/posix
     option directory /gfsvol/data
end-volume

volume plocks
   type features/posix-locks
   subvolumes posix
end-volume

volume writebehind
   type performance/write-behind
   option flush-behind off    # default is 'off'
   subvolumes plocks
end-volume

volume readahead
   type performance/read-ahead
   option page-size 128kB        # 256KB is the default option
   option page-count 4           # 2 is default option
   option force-atime-update off # default is off
   subvolumes writebehind
end-volume

volume brick
   type performance/io-threads
   option thread-count 4  # deault is 1
   option cache-size 64MB #64MB
   subvolumes readahead
end-volume

volume server
     type protocol/server
     option transport-type tcp/server
     subvolumes brick
     option auth.ip.brick.allow 10.8.1.*,127.0.0.1
end-volume


===/etc/glusterfs/gluster-client.vol

volume brick1
     type protocol/client
     option transport-type tcp/client # for TCP/IP transport
     option remote-host 10.8.1.9   # IP address of server1
     option remote-subvolume brick    # name of the remote volume on  
server1
end-volume

volume brick2
     type protocol/client
     option transport-type tcp/client # for TCP/IP transport
     option remote-host 10.8.1.10   # IP address of server2
     option remote-subvolume brick    # name of the remote volume on  
server2
end-volume

volume afr
    type cluster/afr
    subvolumes brick1 brick2
end-volume

volume writebehind
   type performance/write-behind
   option aggregate-size 0MB
   option flush-behind off    # default is 'off'
   subvolumes afr
end-volume

volume readahead
   type performance/read-ahead
   option page-size 128kB        # 256KB is the default option
   option page-count 4           # 2 is default option
   option force-atime-update off # default is off
   subvolumes writebehind
end-volume

Chris Davies

2008-Aug-07 05:05 UTC

head link

[Gluster-users] Gluster 1.3.10 Performance Issues

A continuation:

I used XFS & MD raid 1 on the partitions for the initial tests.
I tested reiser3 and reiser4 with no significant difference
I reraided to MD Raid 0 with XFS and received some improvement

I NFS mounted the partition and received bonnie++ numbers similar to  
the best clientside AFR numbers I have been able to get, but,  
unpacking the kernel using nfsv4/udp took 1 minute 47 seconds compared  
with 12 seconds for the bare drive, 41 seconds for serverside AFR and  
an average of 17 minutes for clientside AFR.

If I turn off AFR, whether I mount the remote machine over the net or  
use the local server's brick, tar xjf of a kernel takes roughly 29  
seconds.

Large files replicate almost at wire speed.  rsync/cp -Rp of a large  
directory takes considerable time.

Both QA releases I've attempted of 1.4.0 have broken within minutes  
using my configurations.  1.4.0qa32 and 1.4.0qa33.  I'll turn debug  
logs on and post summaries of those.

On Aug 6, 2008, at 2:48 PM, Chris Davies wrote:
> OS: Debian Linux/4.1, 64bit build
> Hardware: quad core xeon x3220, 8gb RAM, dual 7200RPM 1000gb WD Hard
> Drives, 750gb raid 1 partition set as /gfsvol to be exported, dual
> gigE, juniper ex3200 switch
>
> Fuse libraries: fuse-2.7.3glfs10
> Gluster: glusterfs-1.3.10
>
> Running bonnie++ on both machines results in almost identical numbers,
> eth1 is reserved wholly for server to server communications.  Right
> now, the only load on these machines comes from my testbed.  There are
> four tests that give a reasonable indicator of performance.
>
> * loading a wordpress blog and looking at the line:
> <!-- 24 queries. 0.634 seconds. -->
> * dd if=/dev/zero of=/gfs/test/out bs=1M count=512
> * time tar xjf /gfs/test/linux-2.6.26.1.tar.bz2
> * /usr/sbin/bonnie++ /gfs/test/
>
> On the wordpress test, .3 seconds is typical.  On various gluster
> configurations I've received between .411 seconds (server side afr
> config below) and 1.2 seconds with some of the example
> configurations.  Currently, my clientside AFR config comes in at .5xx
> seconds rather consistently.
>
> The second test on the clientside AFR results in 536870912 bytes (537
> MB) copied, 4.65395 s, 115 MB/s
>
> The third test is unpacking a kernel which has ranged from 28 seconds
> using the Serverside AFR to 6+ minutes on some configurations.
> Currently the clientside AFR config comes in at about 17 minutes.
>
> The fourth test is a run of bonnie++ which varies from 36 minutes on
> the serverside AFR to the 80 minute run on the clientside AFR config.
>
> Current test environment is using both servers as clients & servers --
> if I can get reasonable performance, the existing machines will become
> clients and the servers will be split to their own platform, so, I
> want to make sure I am using tcp for connections to give as close to a
> real world deployment as possible.  This means I cannot run a client-
> only config.
>
> Baseline Wordpress returns .311-.399 seconds
> Baseline dd 536870912 bytes (537 MB) copied, 0.489522 s, 1.1 GB/s
> Baseline tar xjf of the kernel, real	0m12.164s
> Baseline Config bonnie++ run on the raid 1 partition: (echo data |
> bon_csv2txt for the text reporting)
>
> c1ws1,16G,
> 66470,97,93198,16,42430,6,60253,86,97153,7,381.3,0,16,7534,37,+++++,++
> +,5957,23,7320,34,+++++,+++,4667,21
>
> So far, the best performance I could manage was Server Side AFR with
> writebehind/readahead on the server, with aggregate-size set to 0mb,
> and the client side running writebehind/readahead.  That resulted in:
>
> c1ws2,16G,
>
37636,50,76855,3,17429,2,60376,76,87653,3,158.6,0,16,1741,3,9683,6,2591,3,2030,3,9790,5,2369,3
>
> It was suggested in IRC that clientside AFR would be faster and more
> reliable, however, I've ended up with the following as the best
> results from multiple attempts:
>
> c1ws1,16G,
>
46041,58,76811,2,4603,0,59140,76,86103,3,132.4,0,16,1069,2,4795,2,1308,2,1045,2,5209,2,1246,2
>
> The bonnie++ run from the serverside AFR that resulted in the best
> results I've received to date took 34 minutes.  The latest clientside
> AFR bonnie run took 80 minutes.  Based on the website, I would expect
> to see better performance than drbd/GFS, but, so far that hasn't been
> the case.
>
> Its been suggested that I use unify-rr-afr.  In my current setup, it
> seems that to do that, I would need to break my raid set which is my
> next step in debugging this.  Rather than use Raid 1 on the server, I
> would have 2 bricks on each server which would allow the use of unify
> and the rr scheduler.
>
> glusterfs-1.4.0qa32 results in
> [Wed Aug 06 02:01:44 2008] [notice] child pid 14025 exit signal Bus
> error (7)
> [Wed Aug 06 02:01:44 2008] [notice] child pid 14037 exit signal Bus
> error (7)
>
> when apache (not mod_gluster) tries to serve files off the glusterfs
> partition.
>
> The main issue I'm having right now is file creation speed.  I realize
> that to create a file I need to do two network ops for each file
> created, but, it seems that something is horribly wrong in my
> configuration from the results in untarring the kernel.
>
> I've tried moving the performance translators around, but, some
don't
> seem to make much difference on the server side, and the ones that
> appear to make some difference client side, don't seem to help the
> file creation issue.
>
> On a side note, zresearch.com, I emailed through your contact form and
> haven't heard back -- please provide a quote for generating the
> configuration and contact me offlist.
>
> ===/etc/gluster/gluster-server.vol
> volume posix
>     type storage/posix
>     option directory /gfsvol/data
> end-volume
>
> volume plocks
>   type features/posix-locks
>   subvolumes posix
> end-volume
>
> volume writebehind
>   type performance/write-behind
>   option flush-behind off    # default is 'off'
>   subvolumes plocks
> end-volume
>
> volume readahead
>   type performance/read-ahead
>   option page-size 128kB        # 256KB is the default option
>   option page-count 4           # 2 is default option
>   option force-atime-update off # default is off
>   subvolumes writebehind
> end-volume
>
> volume brick
>   type performance/io-threads
>   option thread-count 4  # deault is 1
>   option cache-size 64MB #64MB
>   subvolumes readahead
> end-volume
>
> volume server
>     type protocol/server
>     option transport-type tcp/server
>     subvolumes brick
>     option auth.ip.brick.allow 10.8.1.*,127.0.0.1
> end-volume
>
>
> ===/etc/glusterfs/gluster-client.vol
>
> volume brick1
>     type protocol/client
>     option transport-type tcp/client # for TCP/IP transport
>     option remote-host 10.8.1.9   # IP address of server1
>     option remote-subvolume brick    # name of the remote volume on
> server1
> end-volume
>
> volume brick2
>     type protocol/client
>     option transport-type tcp/client # for TCP/IP transport
>     option remote-host 10.8.1.10   # IP address of server2
>     option remote-subvolume brick    # name of the remote volume on
> server2
> end-volume
>
> volume afr
>    type cluster/afr
>    subvolumes brick1 brick2
> end-volume
>
> volume writebehind
>   type performance/write-behind
>   option aggregate-size 0MB
>   option flush-behind off    # default is 'off'
>   subvolumes afr
> end-volume
>
> volume readahead
>   type performance/read-ahead
>   option page-size 128kB        # 256KB is the default option
>   option page-count 4           # 2 is default option
>   option force-atime-update off # default is off
>   subvolumes writebehind
> end-volume
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>
> !DSPAM:1,4899f27e222741195416303!
>

Keith Freedman

2008-Aug-07 06:04 UTC

head link

[Gluster-users] Gluster 1.3.10 Performance Issues

here are my thoughts...
Mostly from a storage and networking perspective, my understanding of 
gluster internals isn''t very deep.

obviously you''re best disk performance is going to be RAID0+1. If 
you''re doing software RAID, pick a stripe size that best suits your 
data profile, but if you''re doing things like unpacking a kernel, 
you''ll have a mix of small and medium files and occasional large 
ones, so I''m not sure you''ll get much benefit from tweaking
block sizes.

With this configuration, you should be able to write data at network 
speed fairly easily.
Here are the timings that I can see coming into play:

Client side AFR:
writes happen to both servers..  this means you can at most write 
data at 50% of network speed.  However, your writes happen in 
parallel.  Servers should be able to keep up with 50% of the network 
speed (with GigE or Infiniband, obviously the disk becomes the bottle 
neck,--so a larger stripe set will help with this.).
If gluster has to wait for a response from both servers before 
proceeding (which would make sense), then you''re as strong as the 
weakest link in the chain.

Server side AFR:
Writes happen to one server.  However, this server gets the data and 
now has to replicate it.
So, it then sends a write to the other server.  Waits for the 
response, then responds back to the client.
So this will take twice as long for every transaction Plus the 
internal overhead of the gluster server process having to figure out 
what to do, repack the data and send it.
lets just say that''s 10% overhead..   Now, since the server has data 
coming in and the same amount of data going out (to the other 
server), your server can really only take in 1/2 the data.  So you''re 
now running at the same speed as the client side afr, but you''re 
having to add the internal overhead, and you double your network 
latency since you''re basically sending the same data again..

So, if a file takes 10 seconds to transmit via client side AFR,...
with server side afr, it gets to the server in 5 seconds.
the server takes .5 seconds to do the AFR stuff.
and 5 seconds sending to the other server.  it then gets a response, 
and then passes the response back to the client.
The network latency is basically doubled in this case.  so for simple 
math lets just add another 10%.
so we would be at 12 seconds, however..  since the server port is 
taking data in and out lets assume it''s network interface can only be 
50% saturated.
So that same data takes 10 seconds to get to the server.  10 seconds 
to get to the 2nd AFR server, plus the 2 seconds overhead.

so server side afr takes 220% longer than client side AFR


If all my assumptions are true.  what might solve some of the problem 
(this would help both client side and server side), is to use 
additional network ports.
Either the server replicates over a different port or the client 
talks to the 2 servers over different ports.

It would be interesting for you to rerun your tests with a multi-nic 
configuration in both scenarios.

It''s safe to assume that at any speed, more is better :)

Depending on your port speeds, which I dont recall, but I think you 
provided, your hardware disk configuration wont matter.  100BaseT you 
can probably do just as well with a single drive as with a raid 0, 1, 
or 0+1.  with 1000BaseT or faster you will want a drive configuration 
that can sustain the data transfer you''ll be needing.

Hope that wasn''t confusing.

At 10:05 PM 8/6/2008, Chris Davies wrote:>A continuation:
>
>I used XFS & MD raid 1 on the partitions for the initial tests.
>I tested reiser3 and reiser4 with no significant difference
>I reraided to MD Raid 0 with XFS and received some improvement
>
>I NFS mounted the partition and received bonnie++ numbers similar to
>the best clientside AFR numbers I have been able to get, but,
>unpacking the kernel using nfsv4/udp took 1 minute 47 seconds compared
>with 12 seconds for the bare drive, 41 seconds for serverside AFR and
>an average of 17 minutes for clientside AFR.
>
>If I turn off AFR, whether I mount the remote machine over the net or
>use the local server''s brick, tar xjf of a kernel takes roughly 29
>seconds.
>
>Large files replicate almost at wire speed.  rsync/cp -Rp of a large
>directory takes considerable time.
>
>Both QA releases I''ve attempted of 1.4.0 have broken within minutes
>using my configurations.  1.4.0qa32 and 1.4.0qa33.  I''ll turn debug
>logs on and post summaries of those.
>
>On Aug 6, 2008, at 2:48 PM, Chris Davies wrote:
>
> > OS: Debian Linux/4.1, 64bit build
> > Hardware: quad core xeon x3220, 8gb RAM, dual 7200RPM 1000gb WD Hard
> > Drives, 750gb raid 1 partition set as /gfsvol to be exported, dual
> > gigE, juniper ex3200 switch
> >
> > Fuse libraries: fuse-2.7.3glfs10
> > Gluster: glusterfs-1.3.10
> >
> > Running bonnie++ on both machines results in almost identical numbers,
> > eth1 is reserved wholly for server to server communications.  Right
> > now, the only load on these machines comes from my testbed.  There are
> > four tests that give a reasonable indicator of performance.
> >
> > * loading a wordpress blog and looking at the line:
> > <!-- 24 queries. 0.634 seconds. -->
> > * dd if=/dev/zero of=/gfs/test/out bs=1M count=512
> > * time tar xjf /gfs/test/linux-2.6.26.1.tar.bz2
> > * /usr/sbin/bonnie++ /gfs/test/
> >
> > On the wordpress test, .3 seconds is typical.  On various gluster
> > configurations I''ve received between .411 seconds (server
side afr
> > config below) and 1.2 seconds with some of the example
> > configurations.  Currently, my clientside AFR config comes in at .5xx
> > seconds rather consistently.
> >
> > The second test on the clientside AFR results in 536870912 bytes (537
> > MB) copied, 4.65395 s, 115 MB/s
> >
> > The third test is unpacking a kernel which has ranged from 28 seconds
> > using the Serverside AFR to 6+ minutes on some configurations.
> > Currently the clientside AFR config comes in at about 17 minutes.
> >
> > The fourth test is a run of bonnie++ which varies from 36 minutes on
> > the serverside AFR to the 80 minute run on the clientside AFR config.
> >
> > Current test environment is using both servers as clients &
servers --
> > if I can get reasonable performance, the existing machines will become
> > clients and the servers will be split to their own platform, so, I
> > want to make sure I am using tcp for connections to give as close to a
> > real world deployment as possible.  This means I cannot run a client-
> > only config.
> >
> > Baseline Wordpress returns .311-.399 seconds
> > Baseline dd 536870912 bytes (537 MB) copied, 0.489522 s, 1.1 GB/s
> > Baseline tar xjf of the kernel, real  0m12.164s
> > Baseline Config bonnie++ run on the raid 1 partition: (echo data |
> > bon_csv2txt for the text reporting)
> >
> > c1ws1,16G,
> > 66470,97,93198,16,42430,6,60253,86,97153,7,381.3,0,16,7534,37,+++++,++
> > +,5957,23,7320,34,+++++,+++,4667,21
> >
> > So far, the best performance I could manage was Server Side AFR with
> > writebehind/readahead on the server, with aggregate-size set to 0mb,
> > and the client side running writebehind/readahead.  That resulted in:
> >
> > c1ws2,16G,
> > 
>
37636,50,76855,3,17429,2,60376,76,87653,3,158.6,0,16,1741,3,9683,6,2591,3,2030,3,9790,5,2369,3
> >
> > It was suggested in IRC that clientside AFR would be faster and more
> > reliable, however, I''ve ended up with the following as the
best
> > results from multiple attempts:
> >
> > c1ws1,16G,
> > 
>
46041,58,76811,2,4603,0,59140,76,86103,3,132.4,0,16,1069,2,4795,2,1308,2,1045,2,5209,2,1246,2
> >
> > The bonnie++ run from the serverside AFR that resulted in the best
> > results I''ve received to date took 34 minutes.  The latest
clientside
> > AFR bonnie run took 80 minutes.  Based on the website, I would expect
> > to see better performance than drbd/GFS, but, so far that
hasn''t been
> > the case.
> >
> > Its been suggested that I use unify-rr-afr.  In my current setup, it
> > seems that to do that, I would need to break my raid set which is my
> > next step in debugging this.  Rather than use Raid 1 on the server, I
> > would have 2 bricks on each server which would allow the use of unify
> > and the rr scheduler.
> >
> > glusterfs-1.4.0qa32 results in
> > [Wed Aug 06 02:01:44 2008] [notice] child pid 14025 exit signal Bus
> > error (7)
> > [Wed Aug 06 02:01:44 2008] [notice] child pid 14037 exit signal Bus
> > error (7)
> >
> > when apache (not mod_gluster) tries to serve files off the glusterfs
> > partition.
> >
> > The main issue I''m having right now is file creation speed. 
I realize
> > that to create a file I need to do two network ops for each file
> > created, but, it seems that something is horribly wrong in my
> > configuration from the results in untarring the kernel.
> >
> > I''ve tried moving the performance translators around, but,
some don''t
> > seem to make much difference on the server side, and the ones that
> > appear to make some difference client side, don''t seem to
help the
> > file creation issue.
> >
> > On a side note, zresearch.com, I emailed through your contact form and
> > haven''t heard back -- please provide a quote for generating
the
> > configuration and contact me offlist.
> >
> > ===/etc/gluster/gluster-server.vol
> > volume posix
> >     type storage/posix
> >     option directory /gfsvol/data
> > end-volume
> >
> > volume plocks
> >   type features/posix-locks
> >   subvolumes posix
> > end-volume
> >
> > volume writebehind
> >   type performance/write-behind
> >   option flush-behind off    # default is ''off''
> >   subvolumes plocks
> > end-volume
> >
> > volume readahead
> >   type performance/read-ahead
> >   option page-size 128kB        # 256KB is the default option
> >   option page-count 4           # 2 is default option
> >   option force-atime-update off # default is off
> >   subvolumes writebehind
> > end-volume
> >
> > volume brick
> >   type performance/io-threads
> >   option thread-count 4  # deault is 1
> >   option cache-size 64MB #64MB
> >   subvolumes readahead
> > end-volume
> >
> > volume server
> >     type protocol/server
> >     option transport-type tcp/server
> >     subvolumes brick
> >     option auth.ip.brick.allow 10.8.1.*,127.0.0.1
> > end-volume
> >
> >
> > ===/etc/glusterfs/gluster-client.vol
> >
> > volume brick1
> >     type protocol/client
> >     option transport-type tcp/client # for TCP/IP transport
> >     option remote-host 10.8.1.9   # IP address of server1
> >     option remote-subvolume brick    # name of the remote volume on
> > server1
> > end-volume
> >
> > volume brick2
> >     type protocol/client
> >     option transport-type tcp/client # for TCP/IP transport
> >     option remote-host 10.8.1.10   # IP address of server2
> >     option remote-subvolume brick    # name of the remote volume on
> > server2
> > end-volume
> >
> > volume afr
> >    type cluster/afr
> >    subvolumes brick1 brick2
> > end-volume
> >
> > volume writebehind
> >   type performance/write-behind
> >   option aggregate-size 0MB
> >   option flush-behind off    # default is ''off''
> >   subvolumes afr
> > end-volume
> >
> > volume readahead
> >   type performance/read-ahead
> >   option page-size 128kB        # 256KB is the default option
> >   option page-count 4           # 2 is default option
> >   option force-atime-update off # default is off
> >   subvolumes writebehind
> > end-volume
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
> >
> > !DSPAM:1,4899f27e222741195416303!
> >
>
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Keith Freedman

2008-Aug-07 06:04 UTC

head link

[Gluster-users] Gluster 1.3.10 Performance Issues

here are my thoughts...
Mostly from a storage and networking perspective, my understanding of 
gluster internals isn''t very deep.

obviously you''re best disk performance is going to be RAID0+1. If 
you''re doing software RAID, pick a stripe size that best suits your 
data profile, but if you''re doing things like unpacking a kernel, 
you''ll have a mix of small and medium files and occasional large 
ones, so I''m not sure you''ll get much benefit from tweaking
block sizes.

With this configuration, you should be able to write data at network 
speed fairly easily.
Here are the timings that I can see coming into play:

Client side AFR:
writes happen to both servers..  this means you can at most write 
data at 50% of network speed.  However, your writes happen in 
parallel.  Servers should be able to keep up with 50% of the network 
speed (with GigE or Infiniband, obviously the disk becomes the bottle 
neck,--so a larger stripe set will help with this.).
If gluster has to wait for a response from both servers before 
proceeding (which would make sense), then you''re as strong as the 
weakest link in the chain.

Server side AFR:
Writes happen to one server.  However, this server gets the data and 
now has to replicate it.
So, it then sends a write to the other server.  Waits for the 
response, then responds back to the client.
So this will take twice as long for every transaction Plus the 
internal overhead of the gluster server process having to figure out 
what to do, repack the data and send it.
lets just say that''s 10% overhead..   Now, since the server has data 
coming in and the same amount of data going out (to the other 
server), your server can really only take in 1/2 the data.  So you''re 
now running at the same speed as the client side afr, but you''re 
having to add the internal overhead, and you double your network 
latency since you''re basically sending the same data again..

So, if a file takes 10 seconds to transmit via client side AFR,...
with server side afr, it gets to the server in 5 seconds.
the server takes .5 seconds to do the AFR stuff.
and 5 seconds sending to the other server.  it then gets a response, 
and then passes the response back to the client.
The network latency is basically doubled in this case.  so for simple 
math lets just add another 10%.
so we would be at 12 seconds, however..  since the server port is 
taking data in and out lets assume it''s network interface can only be 
50% saturated.
So that same data takes 10 seconds to get to the server.  10 seconds 
to get to the 2nd AFR server, plus the 2 seconds overhead.

so server side afr takes 220% longer than client side AFR


If all my assumptions are true.  what might solve some of the problem 
(this would help both client side and server side), is to use 
additional network ports.
Either the server replicates over a different port or the client 
talks to the 2 servers over different ports.

It would be interesting for you to rerun your tests with a multi-nic 
configuration in both scenarios.

It''s safe to assume that at any speed, more is better :)

Depending on your port speeds, which I dont recall, but I think you 
provided, your hardware disk configuration wont matter.  100BaseT you 
can probably do just as well with a single drive as with a raid 0, 1, 
or 0+1.  with 1000BaseT or faster you will want a drive configuration 
that can sustain the data transfer you''ll be needing.

Hope that wasn''t confusing.

At 10:05 PM 8/6/2008, Chris Davies wrote:>A continuation:
>
>I used XFS & MD raid 1 on the partitions for the initial tests.
>I tested reiser3 and reiser4 with no significant difference
>I reraided to MD Raid 0 with XFS and received some improvement
>
>I NFS mounted the partition and received bonnie++ numbers similar to
>the best clientside AFR numbers I have been able to get, but,
>unpacking the kernel using nfsv4/udp took 1 minute 47 seconds compared
>with 12 seconds for the bare drive, 41 seconds for serverside AFR and
>an average of 17 minutes for clientside AFR.
>
>If I turn off AFR, whether I mount the remote machine over the net or
>use the local server''s brick, tar xjf of a kernel takes roughly 29
>seconds.
>
>Large files replicate almost at wire speed.  rsync/cp -Rp of a large
>directory takes considerable time.
>
>Both QA releases I''ve attempted of 1.4.0 have broken within minutes
>using my configurations.  1.4.0qa32 and 1.4.0qa33.  I''ll turn debug
>logs on and post summaries of those.
>
>On Aug 6, 2008, at 2:48 PM, Chris Davies wrote:
>
> > OS: Debian Linux/4.1, 64bit build
> > Hardware: quad core xeon x3220, 8gb RAM, dual 7200RPM 1000gb WD Hard
> > Drives, 750gb raid 1 partition set as /gfsvol to be exported, dual
> > gigE, juniper ex3200 switch
> >
> > Fuse libraries: fuse-2.7.3glfs10
> > Gluster: glusterfs-1.3.10
> >
> > Running bonnie++ on both machines results in almost identical numbers,
> > eth1 is reserved wholly for server to server communications.  Right
> > now, the only load on these machines comes from my testbed.  There are
> > four tests that give a reasonable indicator of performance.
> >
> > * loading a wordpress blog and looking at the line:
> > <!-- 24 queries. 0.634 seconds. -->
> > * dd if=/dev/zero of=/gfs/test/out bs=1M count=512
> > * time tar xjf /gfs/test/linux-2.6.26.1.tar.bz2
> > * /usr/sbin/bonnie++ /gfs/test/
> >
> > On the wordpress test, .3 seconds is typical.  On various gluster
> > configurations I''ve received between .411 seconds (server
side afr
> > config below) and 1.2 seconds with some of the example
> > configurations.  Currently, my clientside AFR config comes in at .5xx
> > seconds rather consistently.
> >
> > The second test on the clientside AFR results in 536870912 bytes (537
> > MB) copied, 4.65395 s, 115 MB/s
> >
> > The third test is unpacking a kernel which has ranged from 28 seconds
> > using the Serverside AFR to 6+ minutes on some configurations.
> > Currently the clientside AFR config comes in at about 17 minutes.
> >
> > The fourth test is a run of bonnie++ which varies from 36 minutes on
> > the serverside AFR to the 80 minute run on the clientside AFR config.
> >
> > Current test environment is using both servers as clients &
servers --
> > if I can get reasonable performance, the existing machines will become
> > clients and the servers will be split to their own platform, so, I
> > want to make sure I am using tcp for connections to give as close to a
> > real world deployment as possible.  This means I cannot run a client-
> > only config.
> >
> > Baseline Wordpress returns .311-.399 seconds
> > Baseline dd 536870912 bytes (537 MB) copied, 0.489522 s, 1.1 GB/s
> > Baseline tar xjf of the kernel, real  0m12.164s
> > Baseline Config bonnie++ run on the raid 1 partition: (echo data |
> > bon_csv2txt for the text reporting)
> >
> > c1ws1,16G,
> > 66470,97,93198,16,42430,6,60253,86,97153,7,381.3,0,16,7534,37,+++++,++
> > +,5957,23,7320,34,+++++,+++,4667,21
> >
> > So far, the best performance I could manage was Server Side AFR with
> > writebehind/readahead on the server, with aggregate-size set to 0mb,
> > and the client side running writebehind/readahead.  That resulted in:
> >
> > c1ws2,16G,
> > 
>
37636,50,76855,3,17429,2,60376,76,87653,3,158.6,0,16,1741,3,9683,6,2591,3,2030,3,9790,5,2369,3
> >
> > It was suggested in IRC that clientside AFR would be faster and more
> > reliable, however, I''ve ended up with the following as the
best
> > results from multiple attempts:
> >
> > c1ws1,16G,
> > 
>
46041,58,76811,2,4603,0,59140,76,86103,3,132.4,0,16,1069,2,4795,2,1308,2,1045,2,5209,2,1246,2
> >
> > The bonnie++ run from the serverside AFR that resulted in the best
> > results I''ve received to date took 34 minutes.  The latest
clientside
> > AFR bonnie run took 80 minutes.  Based on the website, I would expect
> > to see better performance than drbd/GFS, but, so far that
hasn''t been
> > the case.
> >
> > Its been suggested that I use unify-rr-afr.  In my current setup, it
> > seems that to do that, I would need to break my raid set which is my
> > next step in debugging this.  Rather than use Raid 1 on the server, I
> > would have 2 bricks on each server which would allow the use of unify
> > and the rr scheduler.
> >
> > glusterfs-1.4.0qa32 results in
> > [Wed Aug 06 02:01:44 2008] [notice] child pid 14025 exit signal Bus
> > error (7)
> > [Wed Aug 06 02:01:44 2008] [notice] child pid 14037 exit signal Bus
> > error (7)
> >
> > when apache (not mod_gluster) tries to serve files off the glusterfs
> > partition.
> >
> > The main issue I''m having right now is file creation speed. 
I realize
> > that to create a file I need to do two network ops for each file
> > created, but, it seems that something is horribly wrong in my
> > configuration from the results in untarring the kernel.
> >
> > I''ve tried moving the performance translators around, but,
some don''t
> > seem to make much difference on the server side, and the ones that
> > appear to make some difference client side, don''t seem to
help the
> > file creation issue.
> >
> > On a side note, zresearch.com, I emailed through your contact form and
> > haven''t heard back -- please provide a quote for generating
the
> > configuration and contact me offlist.
> >
> > ===/etc/gluster/gluster-server.vol
> > volume posix
> >     type storage/posix
> >     option directory /gfsvol/data
> > end-volume
> >
> > volume plocks
> >   type features/posix-locks
> >   subvolumes posix
> > end-volume
> >
> > volume writebehind
> >   type performance/write-behind
> >   option flush-behind off    # default is ''off''
> >   subvolumes plocks
> > end-volume
> >
> > volume readahead
> >   type performance/read-ahead
> >   option page-size 128kB        # 256KB is the default option
> >   option page-count 4           # 2 is default option
> >   option force-atime-update off # default is off
> >   subvolumes writebehind
> > end-volume
> >
> > volume brick
> >   type performance/io-threads
> >   option thread-count 4  # deault is 1
> >   option cache-size 64MB #64MB
> >   subvolumes readahead
> > end-volume
> >
> > volume server
> >     type protocol/server
> >     option transport-type tcp/server
> >     subvolumes brick
> >     option auth.ip.brick.allow 10.8.1.*,127.0.0.1
> > end-volume
> >
> >
> > ===/etc/glusterfs/gluster-client.vol
> >
> > volume brick1
> >     type protocol/client
> >     option transport-type tcp/client # for TCP/IP transport
> >     option remote-host 10.8.1.9   # IP address of server1
> >     option remote-subvolume brick    # name of the remote volume on
> > server1
> > end-volume
> >
> > volume brick2
> >     type protocol/client
> >     option transport-type tcp/client # for TCP/IP transport
> >     option remote-host 10.8.1.10   # IP address of server2
> >     option remote-subvolume brick    # name of the remote volume on
> > server2
> > end-volume
> >
> > volume afr
> >    type cluster/afr
> >    subvolumes brick1 brick2
> > end-volume
> >
> > volume writebehind
> >   type performance/write-behind
> >   option aggregate-size 0MB
> >   option flush-behind off    # default is ''off''
> >   subvolumes afr
> > end-volume
> >
> > volume readahead
> >   type performance/read-ahead
> >   option page-size 128kB        # 256KB is the default option
> >   option page-count 4           # 2 is default option
> >   option force-atime-update off # default is off
> >   subvolumes writebehind
> > end-volume
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
> >
> > !DSPAM:1,4899f27e222741195416303!
> >
>
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Keith Freedman

2008-Aug-07 06:04 UTC

head link

[Gluster-users] Gluster 1.3.10 Performance Issues

here are my thoughts...
Mostly from a storage and networking perspective, my understanding of 
gluster internals isn''t very deep.

obviously you''re best disk performance is going to be RAID0+1. If 
you''re doing software RAID, pick a stripe size that best suits your 
data profile, but if you''re doing things like unpacking a kernel, 
you''ll have a mix of small and medium files and occasional large 
ones, so I''m not sure you''ll get much benefit from tweaking
block sizes.

With this configuration, you should be able to write data at network 
speed fairly easily.
Here are the timings that I can see coming into play:

Client side AFR:
writes happen to both servers..  this means you can at most write 
data at 50% of network speed.  However, your writes happen in 
parallel.  Servers should be able to keep up with 50% of the network 
speed (with GigE or Infiniband, obviously the disk becomes the bottle 
neck,--so a larger stripe set will help with this.).
If gluster has to wait for a response from both servers before 
proceeding (which would make sense), then you''re as strong as the 
weakest link in the chain.

Server side AFR:
Writes happen to one server.  However, this server gets the data and 
now has to replicate it.
So, it then sends a write to the other server.  Waits for the 
response, then responds back to the client.
So this will take twice as long for every transaction Plus the 
internal overhead of the gluster server process having to figure out 
what to do, repack the data and send it.
lets just say that''s 10% overhead..   Now, since the server has data 
coming in and the same amount of data going out (to the other 
server), your server can really only take in 1/2 the data.  So you''re 
now running at the same speed as the client side afr, but you''re 
having to add the internal overhead, and you double your network 
latency since you''re basically sending the same data again..

So, if a file takes 10 seconds to transmit via client side AFR,...
with server side afr, it gets to the server in 5 seconds.
the server takes .5 seconds to do the AFR stuff.
and 5 seconds sending to the other server.  it then gets a response, 
and then passes the response back to the client.
The network latency is basically doubled in this case.  so for simple 
math lets just add another 10%.
so we would be at 12 seconds, however..  since the server port is 
taking data in and out lets assume it''s network interface can only be 
50% saturated.
So that same data takes 10 seconds to get to the server.  10 seconds 
to get to the 2nd AFR server, plus the 2 seconds overhead.

so server side afr takes 220% longer than client side AFR


If all my assumptions are true.  what might solve some of the problem 
(this would help both client side and server side), is to use 
additional network ports.
Either the server replicates over a different port or the client 
talks to the 2 servers over different ports.

It would be interesting for you to rerun your tests with a multi-nic 
configuration in both scenarios.

It''s safe to assume that at any speed, more is better :)

Depending on your port speeds, which I dont recall, but I think you 
provided, your hardware disk configuration wont matter.  100BaseT you 
can probably do just as well with a single drive as with a raid 0, 1, 
or 0+1.  with 1000BaseT or faster you will want a drive configuration 
that can sustain the data transfer you''ll be needing.

Hope that wasn''t confusing.

At 10:05 PM 8/6/2008, Chris Davies wrote:>A continuation:
>
>I used XFS & MD raid 1 on the partitions for the initial tests.
>I tested reiser3 and reiser4 with no significant difference
>I reraided to MD Raid 0 with XFS and received some improvement
>
>I NFS mounted the partition and received bonnie++ numbers similar to
>the best clientside AFR numbers I have been able to get, but,
>unpacking the kernel using nfsv4/udp took 1 minute 47 seconds compared
>with 12 seconds for the bare drive, 41 seconds for serverside AFR and
>an average of 17 minutes for clientside AFR.
>
>If I turn off AFR, whether I mount the remote machine over the net or
>use the local server''s brick, tar xjf of a kernel takes roughly 29
>seconds.
>
>Large files replicate almost at wire speed.  rsync/cp -Rp of a large
>directory takes considerable time.
>
>Both QA releases I''ve attempted of 1.4.0 have broken within minutes
>using my configurations.  1.4.0qa32 and 1.4.0qa33.  I''ll turn debug
>logs on and post summaries of those.
>
>On Aug 6, 2008, at 2:48 PM, Chris Davies wrote:
>
> > OS: Debian Linux/4.1, 64bit build
> > Hardware: quad core xeon x3220, 8gb RAM, dual 7200RPM 1000gb WD Hard
> > Drives, 750gb raid 1 partition set as /gfsvol to be exported, dual
> > gigE, juniper ex3200 switch
> >
> > Fuse libraries: fuse-2.7.3glfs10
> > Gluster: glusterfs-1.3.10
> >
> > Running bonnie++ on both machines results in almost identical numbers,
> > eth1 is reserved wholly for server to server communications.  Right
> > now, the only load on these machines comes from my testbed.  There are
> > four tests that give a reasonable indicator of performance.
> >
> > * loading a wordpress blog and looking at the line:
> > <!-- 24 queries. 0.634 seconds. -->
> > * dd if=/dev/zero of=/gfs/test/out bs=1M count=512
> > * time tar xjf /gfs/test/linux-2.6.26.1.tar.bz2
> > * /usr/sbin/bonnie++ /gfs/test/
> >
> > On the wordpress test, .3 seconds is typical.  On various gluster
> > configurations I''ve received between .411 seconds (server
side afr
> > config below) and 1.2 seconds with some of the example
> > configurations.  Currently, my clientside AFR config comes in at .5xx
> > seconds rather consistently.
> >
> > The second test on the clientside AFR results in 536870912 bytes (537
> > MB) copied, 4.65395 s, 115 MB/s
> >
> > The third test is unpacking a kernel which has ranged from 28 seconds
> > using the Serverside AFR to 6+ minutes on some configurations.
> > Currently the clientside AFR config comes in at about 17 minutes.
> >
> > The fourth test is a run of bonnie++ which varies from 36 minutes on
> > the serverside AFR to the 80 minute run on the clientside AFR config.
> >
> > Current test environment is using both servers as clients &
servers --
> > if I can get reasonable performance, the existing machines will become
> > clients and the servers will be split to their own platform, so, I
> > want to make sure I am using tcp for connections to give as close to a
> > real world deployment as possible.  This means I cannot run a client-
> > only config.
> >
> > Baseline Wordpress returns .311-.399 seconds
> > Baseline dd 536870912 bytes (537 MB) copied, 0.489522 s, 1.1 GB/s
> > Baseline tar xjf of the kernel, real  0m12.164s
> > Baseline Config bonnie++ run on the raid 1 partition: (echo data |
> > bon_csv2txt for the text reporting)
> >
> > c1ws1,16G,
> > 66470,97,93198,16,42430,6,60253,86,97153,7,381.3,0,16,7534,37,+++++,++
> > +,5957,23,7320,34,+++++,+++,4667,21
> >
> > So far, the best performance I could manage was Server Side AFR with
> > writebehind/readahead on the server, with aggregate-size set to 0mb,
> > and the client side running writebehind/readahead.  That resulted in:
> >
> > c1ws2,16G,
> > 
>
37636,50,76855,3,17429,2,60376,76,87653,3,158.6,0,16,1741,3,9683,6,2591,3,2030,3,9790,5,2369,3
> >
> > It was suggested in IRC that clientside AFR would be faster and more
> > reliable, however, I''ve ended up with the following as the
best
> > results from multiple attempts:
> >
> > c1ws1,16G,
> > 
>
46041,58,76811,2,4603,0,59140,76,86103,3,132.4,0,16,1069,2,4795,2,1308,2,1045,2,5209,2,1246,2
> >
> > The bonnie++ run from the serverside AFR that resulted in the best
> > results I''ve received to date took 34 minutes.  The latest
clientside
> > AFR bonnie run took 80 minutes.  Based on the website, I would expect
> > to see better performance than drbd/GFS, but, so far that
hasn''t been
> > the case.
> >
> > Its been suggested that I use unify-rr-afr.  In my current setup, it
> > seems that to do that, I would need to break my raid set which is my
> > next step in debugging this.  Rather than use Raid 1 on the server, I
> > would have 2 bricks on each server which would allow the use of unify
> > and the rr scheduler.
> >
> > glusterfs-1.4.0qa32 results in
> > [Wed Aug 06 02:01:44 2008] [notice] child pid 14025 exit signal Bus
> > error (7)
> > [Wed Aug 06 02:01:44 2008] [notice] child pid 14037 exit signal Bus
> > error (7)
> >
> > when apache (not mod_gluster) tries to serve files off the glusterfs
> > partition.
> >
> > The main issue I''m having right now is file creation speed. 
I realize
> > that to create a file I need to do two network ops for each file
> > created, but, it seems that something is horribly wrong in my
> > configuration from the results in untarring the kernel.
> >
> > I''ve tried moving the performance translators around, but,
some don''t
> > seem to make much difference on the server side, and the ones that
> > appear to make some difference client side, don''t seem to
help the
> > file creation issue.
> >
> > On a side note, zresearch.com, I emailed through your contact form and
> > haven''t heard back -- please provide a quote for generating
the
> > configuration and contact me offlist.
> >
> > ===/etc/gluster/gluster-server.vol
> > volume posix
> >     type storage/posix
> >     option directory /gfsvol/data
> > end-volume
> >
> > volume plocks
> >   type features/posix-locks
> >   subvolumes posix
> > end-volume
> >
> > volume writebehind
> >   type performance/write-behind
> >   option flush-behind off    # default is ''off''
> >   subvolumes plocks
> > end-volume
> >
> > volume readahead
> >   type performance/read-ahead
> >   option page-size 128kB        # 256KB is the default option
> >   option page-count 4           # 2 is default option
> >   option force-atime-update off # default is off
> >   subvolumes writebehind
> > end-volume
> >
> > volume brick
> >   type performance/io-threads
> >   option thread-count 4  # deault is 1
> >   option cache-size 64MB #64MB
> >   subvolumes readahead
> > end-volume
> >
> > volume server
> >     type protocol/server
> >     option transport-type tcp/server
> >     subvolumes brick
> >     option auth.ip.brick.allow 10.8.1.*,127.0.0.1
> > end-volume
> >
> >
> > ===/etc/glusterfs/gluster-client.vol
> >
> > volume brick1
> >     type protocol/client
> >     option transport-type tcp/client # for TCP/IP transport
> >     option remote-host 10.8.1.9   # IP address of server1
> >     option remote-subvolume brick    # name of the remote volume on
> > server1
> > end-volume
> >
> > volume brick2
> >     type protocol/client
> >     option transport-type tcp/client # for TCP/IP transport
> >     option remote-host 10.8.1.10   # IP address of server2
> >     option remote-subvolume brick    # name of the remote volume on
> > server2
> > end-volume
> >
> > volume afr
> >    type cluster/afr
> >    subvolumes brick1 brick2
> > end-volume
> >
> > volume writebehind
> >   type performance/write-behind
> >   option aggregate-size 0MB
> >   option flush-behind off    # default is ''off''
> >   subvolumes afr
> > end-volume
> >
> > volume readahead
> >   type performance/read-ahead
> >   option page-size 128kB        # 256KB is the default option
> >   option page-count 4           # 2 is default option
> >   option force-atime-update off # default is off
> >   subvolumes writebehind
> > end-volume
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
> >
> > !DSPAM:1,4899f27e222741195416303!
> >
>
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Chris Davies

2008-Aug-07 15:20 UTC

head link

[Gluster-users] Gluster 1.3.10 Performance Issues

On Aug 7, 2008, at 2:04 AM, Keith Freedman wrote:> so server side afr takes 220% longer than client side AFR
>
>
> If all my assumptions are true.  what might solve some of the  
> problem (this would help both client side and server side), is to  
> use additional network ports.
> Either the server replicates over a different port or the client  
> talks to the 2 servers over different ports.
Its not a complex test.  Its not a complex setup.  A glusterfs mounted  
partition set with client side AFR using the previously listed  
hardware over a dedicated GigE port was used to unpack:

$ ls -al linux-2.6.26.1.tar.bz2
-rw-r--r-- 1 daviesinc daviesinc 49459141 2008-08-01 19:04  
linux-2.6.26.1.tar.bz2

The system was not iobound on the network, nor cpu bound.  Neither  
server's cpu went above 3% for either gluster process, the network  
barely showed any activity and was at less than 12mb/second.
> It would be interesting for you to rerun your tests with a multi-nic  
> configuration in both scenarios.
>
> It's safe to assume that at any speed, more is better :)
so you believe that to untar/unbz2 a 49mb file in under 17 minutes, I  
need to bond 2 gigE connections?
> Depending on your port speeds, which I dont recall, but I think you  
> provided, your hardware disk configuration wont matter.  100BaseT  
> you can probably do just as well with a single drive as with a raid  
> 0, 1, or 0+1.  with 1000BaseT or faster you will want a drive  
> configuration that can sustain the data transfer you'll be needing.
this is done under clientside AFR, file is written to both machines.   
a 4.3GB file almost hits wirespeed between the nodes.

$ dd if=/dev/zero of=three bs=1M count=4096
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 37.3989 s, 115 MB/s

$ time cp linux-2.6.26.1.tar.bz2 linux-2.6.26.1.tar.bz2.copy

real	0m0.573s
user	0m0.000s
sys	0m0.052s

I can copy the 49mb file in .57 seconds.

During the tar xjf, switch stats show almost 500pps, bandwidth almost  
hits 4mb/sec (megabits), cpu shows glusterfs and glusterfsd at 1% of  
the cpu each, bzip at roughly 2%, tar rarely shows up in top, but,  
when it does, its very close to the bottom of the page at 1%.

$ time tar xjf linux-2.6.26.1.tar.bz2

real	18m6.799s
user	0m12.877s
sys	0m1.416s

I'm not convinced that this is a network or hardware problem.

>
>
> Hope that wasn't confusing.
>
> At 10:05 PM 8/6/2008, Chris Davies wrote:
>> A continuation:
>>
>> I used XFS & MD raid 1 on the partitions for the initial tests.
>> I tested reiser3 and reiser4 with no significant difference
>> I reraided to MD Raid 0 with XFS and received some improvement
>>
>> I NFS mounted the partition and received bonnie++ numbers similar to
>> the best clientside AFR numbers I have been able to get, but,
>> unpacking the kernel using nfsv4/udp took 1 minute 47 seconds  
>> compared
>> with 12 seconds for the bare drive, 41 seconds for serverside AFR and
>> an average of 17 minutes for clientside AFR.
>>
>> If I turn off AFR, whether I mount the remote machine over the net or
>> use the local server's brick, tar xjf of a kernel takes roughly 29
>> seconds.
>>
>> Large files replicate almost at wire speed.  rsync/cp -Rp of a large
>> directory takes considerable time.
>>
>> Both QA releases I've attempted of 1.4.0 have broken within minutes
>> using my configurations.  1.4.0qa32 and 1.4.0qa33.  I'll turn debug
>> logs on and post summaries of those.
>>
>> On Aug 6, 2008, at 2:48 PM, Chris Davies wrote:
>>
>> > OS: Debian Linux/4.1, 64bit build
>> > Hardware: quad core xeon x3220, 8gb RAM, dual 7200RPM 1000gb WD  
>> Hard
>> > Drives, 750gb raid 1 partition set as /gfsvol to be exported, dual
>> > gigE, juniper ex3200 switch
>> >
>> > Fuse libraries: fuse-2.7.3glfs10
>> > Gluster: glusterfs-1.3.10
>> >
>> > Running bonnie++ on both machines results in almost identical  
>> numbers,
>> > eth1 is reserved wholly for server to server communications. 
Right
>> > now, the only load on these machines comes from my testbed.   
>> There are
>> > four tests that give a reasonable indicator of performance.
>> >
>> > * loading a wordpress blog and looking at the line:
>> > <!-- 24 queries. 0.634 seconds. -->
>> > * dd if=/dev/zero of=/gfs/test/out bs=1M count=512
>> > * time tar xjf /gfs/test/linux-2.6.26.1.tar.bz2
>> > * /usr/sbin/bonnie++ /gfs/test/
>> >
>> > On the wordpress test, .3 seconds is typical.  On various gluster
>> > configurations I've received between .411 seconds (server side
afr
>> > config below) and 1.2 seconds with some of the example
>> > configurations.  Currently, my clientside AFR config comes in at .
>> 5xx
>> > seconds rather consistently.
>> >
>> > The second test on the clientside AFR results in 536870912 bytes  
>> (537
>> > MB) copied, 4.65395 s, 115 MB/s
>> >
>> > The third test is unpacking a kernel which has ranged from 28  
>> seconds
>> > using the Serverside AFR to 6+ minutes on some configurations.
>> > Currently the clientside AFR config comes in at about 17 minutes.
>> >
>> > The fourth test is a run of bonnie++ which varies from 36 minutes
>> on
>> > the serverside AFR to the 80 minute run on the clientside AFR  
>> config.
>> >
>> > Current test environment is using both servers as clients &  
>> servers --
>> > if I can get reasonable performance, the existing machines will  
>> become
>> > clients and the servers will be split to their own platform, so, I
>> > want to make sure I am using tcp for connections to give as close
>> to a
>> > real world deployment as possible.  This means I cannot run a  
>> client-
>> > only config.
>> >
>> > Baseline Wordpress returns .311-.399 seconds
>> > Baseline dd 536870912 bytes (537 MB) copied, 0.489522 s, 1.1 GB/s
>> > Baseline tar xjf of the kernel, real  0m12.164s
>> > Baseline Config bonnie++ run on the raid 1 partition: (echo data |
>> > bon_csv2txt for the text reporting)
>> >
>> > c1ws1,16G,
>> > 66470,97,93198,16,42430,6,60253,86,97153,7,381.3,0,16,7534,37,++++
>> +,++
>> > +,5957,23,7320,34,+++++,+++,4667,21
>> >
>> > So far, the best performance I could manage was Server Side AFR  
>> with
>> > writebehind/readahead on the server, with aggregate-size set to  
>> 0mb,
>> > and the client side running writebehind/readahead.  That resulted
>> in:
>> >
>> > c1ws2,16G,
>> >  
>>
37636,50,76855,3,17429,2,60376,76,87653,3,158.6,0,16,1741,3,9683,6,2591,3,2030,3,9790,5,2369,3
>> >
>> > It was suggested in IRC that clientside AFR would be faster and  
>> more
>> > reliable, however, I've ended up with the following as the
best
>> > results from multiple attempts:
>> >
>> > c1ws1,16G,
>> >  
>>
46041,58,76811,2,4603,0,59140,76,86103,3,132.4,0,16,1069,2,4795,2,1308,2,1045,2,5209,2,1246,2
>> >
>> > The bonnie++ run from the serverside AFR that resulted in the best
>> > results I've received to date took 34 minutes.  The latest  
>> clientside
>> > AFR bonnie run took 80 minutes.  Based on the website, I would  
>> expect
>> > to see better performance than drbd/GFS, but, so far that
hasn't
>> been
>> > the case.
>> >
>> > Its been suggested that I use unify-rr-afr.  In my current setup,
>> it
>> > seems that to do that, I would need to break my raid set which is
>> my
>> > next step in debugging this.  Rather than use Raid 1 on the  
>> server, I
>> > would have 2 bricks on each server which would allow the use of  
>> unify
>> > and the rr scheduler.
>> >
>> > glusterfs-1.4.0qa32 results in
>> > [Wed Aug 06 02:01:44 2008] [notice] child pid 14025 exit signal
Bus
>> > error (7)
>> > [Wed Aug 06 02:01:44 2008] [notice] child pid 14037 exit signal
Bus
>> > error (7)
>> >
>> > when apache (not mod_gluster) tries to serve files off the  
>> glusterfs
>> > partition.
>> >
>> > The main issue I'm having right now is file creation speed.  I
>> realize
>> > that to create a file I need to do two network ops for each file
>> > created, but, it seems that something is horribly wrong in my
>> > configuration from the results in untarring the kernel.
>> >
>> > I've tried moving the performance translators around, but,
some
>> don't
>> > seem to make much difference on the server side, and the ones that
>> > appear to make some difference client side, don't seem to help
the
>> > file creation issue.
>> >
>> > On a side note, zresearch.com, I emailed through your contact  
>> form and
>> > haven't heard back -- please provide a quote for generating
the
>> > configuration and contact me offlist.
>> >
>> > ===/etc/gluster/gluster-server.vol
>> > volume posix
>> >     type storage/posix
>> >     option directory /gfsvol/data
>> > end-volume
>> >
>> > volume plocks
>> >   type features/posix-locks
>> >   subvolumes posix
>> > end-volume
>> >
>> > volume writebehind
>> >   type performance/write-behind
>> >   option flush-behind off    # default is 'off'
>> >   subvolumes plocks
>> > end-volume
>> >
>> > volume readahead
>> >   type performance/read-ahead
>> >   option page-size 128kB        # 256KB is the default option
>> >   option page-count 4           # 2 is default option
>> >   option force-atime-update off # default is off
>> >   subvolumes writebehind
>> > end-volume
>> >
>> > volume brick
>> >   type performance/io-threads
>> >   option thread-count 4  # deault is 1
>> >   option cache-size 64MB #64MB
>> >   subvolumes readahead
>> > end-volume
>> >
>> > volume server
>> >     type protocol/server
>> >     option transport-type tcp/server
>> >     subvolumes brick
>> >     option auth.ip.brick.allow 10.8.1.*,127.0.0.1
>> > end-volume
>> >
>> >
>> > ===/etc/glusterfs/gluster-client.vol
>> >
>> > volume brick1
>> >     type protocol/client
>> >     option transport-type tcp/client # for TCP/IP transport
>> >     option remote-host 10.8.1.9   # IP address of server1
>> >     option remote-subvolume brick    # name of the remote volume
on
>> > server1
>> > end-volume
>> >
>> > volume brick2
>> >     type protocol/client
>> >     option transport-type tcp/client # for TCP/IP transport
>> >     option remote-host 10.8.1.10   # IP address of server2
>> >     option remote-subvolume brick    # name of the remote volume
on
>> > server2
>> > end-volume
>> >
>> > volume afr
>> >    type cluster/afr
>> >    subvolumes brick1 brick2
>> > end-volume
>> >
>> > volume writebehind
>> >   type performance/write-behind
>> >   option aggregate-size 0MB
>> >   option flush-behind off    # default is 'off'
>> >   subvolumes afr
>> > end-volume
>> >
>> > volume readahead
>> >   type performance/read-ahead
>> >   option page-size 128kB        # 256KB is the default option
>> >   option page-count 4           # 2 is default option
>> >   option force-atime-update off # default is off
>> >   subvolumes writebehind
>> > end-volume
>> >
>> > _______________________________________________
>> > Gluster-users mailing list
>> > Gluster-users at gluster.org
>> > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>> >
>> > >
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>
>
> !DSPAM:1,489aa3b2286571187917547!
>

Chris Davies

2008-Aug-07 15:31 UTC

head link

[Gluster-users] Gluster 1.3.10 Performance Issues

On Aug 7, 2008, at 2:04 AM, Keith Freedman wrote:> It's safe to assume that at any speed, more is better :)
>
> Depending on your port speeds, which I dont recall, but I think you  
> provided, your hardware disk configuration wont matter.  100BaseT  
> you can probably do just as well with a single drive as with a raid  
> 0, 1, or 0+1.  with 1000BaseT or faster you will want a drive  
> configuration that can sustain the data transfer you'll be needing.
Previous Clientside AFR result:

$ time tar xjf linux-2.6.26.1.tar.bz2

real	18m6.799s
user	0m12.877s
sys	0m1.416s

Writing over GlusterFS to local brick (AFR disabled):

$ time tar xjf linux-2.6.26.1.tar.bz2

real	0m32.783s
user	0m13.189s
sys	0m1.504s

Writing to GlusterFS to remote brick (AFR disabled):

$ time tar xjf linux-2.6.26.1.tar.bz2

real	0m57.079s
user	0m12.165s
sys	0m1.120s

>
>
> Hope that wasn't confusing.
>
> At 10:05 PM 8/6/2008, Chris Davies wrote:
>> A continuation:
>>
>> I used XFS & MD raid 1 on the partitions for the initial tests.
>> I tested reiser3 and reiser4 with no significant difference
>> I reraided to MD Raid 0 with XFS and received some improvement
>>
>> I NFS mounted the partition and received bonnie++ numbers similar to
>> the best clientside AFR numbers I have been able to get, but,
>> unpacking the kernel using nfsv4/udp took 1 minute 47 seconds  
>> compared
>> with 12 seconds for the bare drive, 41 seconds for serverside AFR and
>> an average of 17 minutes for clientside AFR.
>>
>> If I turn off AFR, whether I mount the remote machine over the net or
>> use the local server's brick, tar xjf of a kernel takes roughly 29
>> seconds.
>>
>> Large files replicate almost at wire speed.  rsync/cp -Rp of a large
>> directory takes considerable time.
>>
>> Both QA releases I've attempted of 1.4.0 have broken within minutes
>> using my configurations.  1.4.0qa32 and 1.4.0qa33.  I'll turn debug
>> logs on and post summaries of those.
>>
>> On Aug 6, 2008, at 2:48 PM, Chris Davies wrote:
>>
>> > OS: Debian Linux/4.1, 64bit build
>> > Hardware: quad core xeon x3220, 8gb RAM, dual 7200RPM 1000gb WD  
>> Hard
>> > Drives, 750gb raid 1 partition set as /gfsvol to be exported, dual
>> > gigE, juniper ex3200 switch
>> >
>> > Fuse libraries: fuse-2.7.3glfs10
>> > Gluster: glusterfs-1.3.10
>> >
>> > Running bonnie++ on both machines results in almost identical  
>> numbers,
>> > eth1 is reserved wholly for server to server communications. 
Right
>> > now, the only load on these machines comes from my testbed.   
>> There are
>> > four tests that give a reasonable indicator of performance.
>> >
>> > * loading a wordpress blog and looking at the line:
>> > <!-- 24 queries. 0.634 seconds. -->
>> > * dd if=/dev/zero of=/gfs/test/out bs=1M count=512
>> > * time tar xjf /gfs/test/linux-2.6.26.1.tar.bz2
>> > * /usr/sbin/bonnie++ /gfs/test/
>> >
>> > On the wordpress test, .3 seconds is typical.  On various gluster
>> > configurations I've received between .411 seconds (server side
afr
>> > config below) and 1.2 seconds with some of the example
>> > configurations.  Currently, my clientside AFR config comes in at .
>> 5xx
>> > seconds rather consistently.
>> >
>> > The second test on the clientside AFR results in 536870912 bytes  
>> (537
>> > MB) copied, 4.65395 s, 115 MB/s
>> >
>> > The third test is unpacking a kernel which has ranged from 28  
>> seconds
>> > using the Serverside AFR to 6+ minutes on some configurations.
>> > Currently the clientside AFR config comes in at about 17 minutes.
>> >
>> > The fourth test is a run of bonnie++ which varies from 36 minutes
>> on
>> > the serverside AFR to the 80 minute run on the clientside AFR  
>> config.
>> >
>> > Current test environment is using both servers as clients &  
>> servers --
>> > if I can get reasonable performance, the existing machines will  
>> become
>> > clients and the servers will be split to their own platform, so, I
>> > want to make sure I am using tcp for connections to give as close
>> to a
>> > real world deployment as possible.  This means I cannot run a  
>> client-
>> > only config.
>> >
>> > Baseline Wordpress returns .311-.399 seconds
>> > Baseline dd 536870912 bytes (537 MB) copied, 0.489522 s, 1.1 GB/s
>> > Baseline tar xjf of the kernel, real  0m12.164s
>> > Baseline Config bonnie++ run on the raid 1 partition: (echo data |
>> > bon_csv2txt for the text reporting)
>> >
>> > c1ws1,16G,
>> > 66470,97,93198,16,42430,6,60253,86,97153,7,381.3,0,16,7534,37,++++
>> +,++
>> > +,5957,23,7320,34,+++++,+++,4667,21
>> >
>> > So far, the best performance I could manage was Server Side AFR  
>> with
>> > writebehind/readahead on the server, with aggregate-size set to  
>> 0mb,
>> > and the client side running writebehind/readahead.  That resulted
>> in:
>> >
>> > c1ws2,16G,
>> >  
>>
37636,50,76855,3,17429,2,60376,76,87653,3,158.6,0,16,1741,3,9683,6,2591,3,2030,3,9790,5,2369,3
>> >
>> > It was suggested in IRC that clientside AFR would be faster and  
>> more
>> > reliable, however, I've ended up with the following as the
best
>> > results from multiple attempts:
>> >
>> > c1ws1,16G,
>> >  
>>
46041,58,76811,2,4603,0,59140,76,86103,3,132.4,0,16,1069,2,4795,2,1308,2,1045,2,5209,2,1246,2
>> >
>> > The bonnie++ run from the serverside AFR that resulted in the best
>> > results I've received to date took 34 minutes.  The latest  
>> clientside
>> > AFR bonnie run took 80 minutes.  Based on the website, I would  
>> expect
>> > to see better performance than drbd/GFS, but, so far that
hasn't
>> been
>> > the case.
>> >
>> > Its been suggested that I use unify-rr-afr.  In my current setup,
>> it
>> > seems that to do that, I would need to break my raid set which is
>> my
>> > next step in debugging this.  Rather than use Raid 1 on the  
>> server, I
>> > would have 2 bricks on each server which would allow the use of  
>> unify
>> > and the rr scheduler.
>> >
>> > glusterfs-1.4.0qa32 results in
>> > [Wed Aug 06 02:01:44 2008] [notice] child pid 14025 exit signal
Bus
>> > error (7)
>> > [Wed Aug 06 02:01:44 2008] [notice] child pid 14037 exit signal
Bus
>> > error (7)
>> >
>> > when apache (not mod_gluster) tries to serve files off the  
>> glusterfs
>> > partition.
>> >
>> > The main issue I'm having right now is file creation speed.  I
>> realize
>> > that to create a file I need to do two network ops for each file
>> > created, but, it seems that something is horribly wrong in my
>> > configuration from the results in untarring the kernel.
>> >
>> > I've tried moving the performance translators around, but,
some
>> don't
>> > seem to make much difference on the server side, and the ones that
>> > appear to make some difference client side, don't seem to help
the
>> > file creation issue.
>> >
>> > On a side note, zresearch.com, I emailed through your contact  
>> form and
>> > haven't heard back -- please provide a quote for generating
the
>> > configuration and contact me offlist.
>> >
>> > ===/etc/gluster/gluster-server.vol
>> > volume posix
>> >     type storage/posix
>> >     option directory /gfsvol/data
>> > end-volume
>> >
>> > volume plocks
>> >   type features/posix-locks
>> >   subvolumes posix
>> > end-volume
>> >
>> > volume writebehind
>> >   type performance/write-behind
>> >   option flush-behind off    # default is 'off'
>> >   subvolumes plocks
>> > end-volume
>> >
>> > volume readahead
>> >   type performance/read-ahead
>> >   option page-size 128kB        # 256KB is the default option
>> >   option page-count 4           # 2 is default option
>> >   option force-atime-update off # default is off
>> >   subvolumes writebehind
>> > end-volume
>> >
>> > volume brick
>> >   type performance/io-threads
>> >   option thread-count 4  # deault is 1
>> >   option cache-size 64MB #64MB
>> >   subvolumes readahead
>> > end-volume
>> >
>> > volume server
>> >     type protocol/server
>> >     option transport-type tcp/server
>> >     subvolumes brick
>> >     option auth.ip.brick.allow 10.8.1.*,127.0.0.1
>> > end-volume
>> >
>> >
>> > ===/etc/glusterfs/gluster-client.vol
>> >
>> > volume brick1
>> >     type protocol/client
>> >     option transport-type tcp/client # for TCP/IP transport
>> >     option remote-host 10.8.1.9   # IP address of server1
>> >     option remote-subvolume brick    # name of the remote volume
on
>> > server1
>> > end-volume
>> >
>> > volume brick2
>> >     type protocol/client
>> >     option transport-type tcp/client # for TCP/IP transport
>> >     option remote-host 10.8.1.10   # IP address of server2
>> >     option remote-subvolume brick    # name of the remote volume
on
>> > server2
>> > end-volume
>> >
>> > volume afr
>> >    type cluster/afr
>> >    subvolumes brick1 brick2
>> > end-volume
>> >
>> > volume writebehind
>> >   type performance/write-behind
>> >   option aggregate-size 0MB
>> >   option flush-behind off    # default is 'off'
>> >   subvolumes afr
>> > end-volume
>> >
>> > volume readahead
>> >   type performance/read-ahead
>> >   option page-size 128kB        # 256KB is the default option
>> >   option page-count 4           # 2 is default option
>> >   option force-atime-update off # default is off
>> >   subvolumes writebehind
>> > end-volume
>> >
>> > _______________________________________________
>> > Gluster-users mailing list
>> > Gluster-users at gluster.org
>> > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>> >
>> > >
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>
>
> !DSPAM:1,489aa3b2286571187917547!
>

Keith Freedman

2008-Aug-07 18:48 UTC

head link

[Gluster-users] Gluster 1.3.10 Performance Issues

At 08:20 AM 8/7/2008, Chris Davies wrote:>I''m not convinced that this is a network or hardware problem.
it doesn''t sound like it to me either.  what''s the server
stats while
you''re untarring?

Hopefully one of thegluster devs will step in with some thoughts.

> >
> >
> > Hope that wasn''t confusing.
> >
> > At 10:05 PM 8/6/2008, Chris Davies wrote:
> >> A continuation:
> >>
> >> I used XFS & MD raid 1 on the partitions for the initial
tests.
> >> I tested reiser3 and reiser4 with no significant difference
> >> I reraided to MD Raid 0 with XFS and received some improvement
> >>
> >> I NFS mounted the partition and received bonnie++ numbers similar
to
> >> the best clientside AFR numbers I have been able to get, but,
> >> unpacking the kernel using nfsv4/udp took 1 minute 47 seconds
> >> compared
> >> with 12 seconds for the bare drive, 41 seconds for serverside AFR
and
> >> an average of 17 minutes for clientside AFR.
> >>
> >> If I turn off AFR, whether I mount the remote machine over the net
or
> >> use the local server''s brick, tar xjf of a kernel takes
roughly 29
> >> seconds.
> >>
> >> Large files replicate almost at wire speed.  rsync/cp -Rp of a
large
> >> directory takes considerable time.
> >>
> >> Both QA releases I''ve attempted of 1.4.0 have broken
within minutes
> >> using my configurations.  1.4.0qa32 and 1.4.0qa33.  I''ll
turn debug
> >> logs on and post summaries of those.
> >>
> >> On Aug 6, 2008, at 2:48 PM, Chris Davies wrote:
> >>
> >> > OS: Debian Linux/4.1, 64bit build
> >> > Hardware: quad core xeon x3220, 8gb RAM, dual 7200RPM 1000gb
WD
> >> Hard
> >> > Drives, 750gb raid 1 partition set as /gfsvol to be exported,
dual
> >> > gigE, juniper ex3200 switch
> >> >
> >> > Fuse libraries: fuse-2.7.3glfs10
> >> > Gluster: glusterfs-1.3.10
> >> >
> >> > Running bonnie++ on both machines results in almost identical
> >> numbers,
> >> > eth1 is reserved wholly for server to server communications. 
Right
> >> > now, the only load on these machines comes from my testbed.
> >> There are
> >> > four tests that give a reasonable indicator of performance.
> >> >
> >> > * loading a wordpress blog and looking at the line:
> >> > <!-- 24 queries. 0.634 seconds. -->
> >> > * dd if=/dev/zero of=/gfs/test/out bs=1M count=512
> >> > * time tar xjf /gfs/test/linux-2.6.26.1.tar.bz2
> >> > * /usr/sbin/bonnie++ /gfs/test/
> >> >
> >> > On the wordpress test, .3 seconds is typical.  On various
gluster
> >> > configurations I''ve received between .411 seconds
(server side afr
> >> > config below) and 1.2 seconds with some of the example
> >> > configurations.  Currently, my clientside AFR config comes in
at .
> >> 5xx
> >> > seconds rather consistently.
> >> >
> >> > The second test on the clientside AFR results in 536870912
bytes
> >> (537
> >> > MB) copied, 4.65395 s, 115 MB/s
> >> >
> >> > The third test is unpacking a kernel which has ranged from 28
> >> seconds
> >> > using the Serverside AFR to 6+ minutes on some
configurations.
> >> > Currently the clientside AFR config comes in at about 17
minutes.
> >> >
> >> > The fourth test is a run of bonnie++ which varies from 36
minutes
> >> on
> >> > the serverside AFR to the 80 minute run on the clientside AFR
> >> config.
> >> >
> >> > Current test environment is using both servers as clients
&
> >> servers --
> >> > if I can get reasonable performance, the existing machines
will
> >> become
> >> > clients and the servers will be split to their own platform,
so, I
> >> > want to make sure I am using tcp for connections to give as
close
> >> to a
> >> > real world deployment as possible.  This means I cannot run a
> >> client-
> >> > only config.
> >> >
> >> > Baseline Wordpress returns .311-.399 seconds
> >> > Baseline dd 536870912 bytes (537 MB) copied, 0.489522 s, 1.1
GB/s
> >> > Baseline tar xjf of the kernel, real  0m12.164s
> >> > Baseline Config bonnie++ run on the raid 1 partition: (echo
data |
> >> > bon_csv2txt for the text reporting)
> >> >
> >> > c1ws1,16G,
> >> >
66470,97,93198,16,42430,6,60253,86,97153,7,381.3,0,16,7534,37,++++
> >> +,++
> >> > +,5957,23,7320,34,+++++,+++,4667,21
> >> >
> >> > So far, the best performance I could manage was Server Side
AFR
> >> with
> >> > writebehind/readahead on the server, with aggregate-size set
to
> >> 0mb,
> >> > and the client side running writebehind/readahead.  That
resulted
> >> in:
> >> >
> >> > c1ws2,16G,
> >> >
> >> 
>
37636,50,76855,3,17429,2,60376,76,87653,3,158.6,0,16,1741,3,9683,6,2591,3,2030,3,9790,5,2369,3
> >> >
> >> > It was suggested in IRC that clientside AFR would be faster
and
> >> more
> >> > reliable, however, I''ve ended up with the following
as the best
> >> > results from multiple attempts:
> >> >
> >> > c1ws1,16G,
> >> >
> >> 
>
46041,58,76811,2,4603,0,59140,76,86103,3,132.4,0,16,1069,2,4795,2,1308,2,1045,2,5209,2,1246,2
> >> >
> >> > The bonnie++ run from the serverside AFR that resulted in the
best
> >> > results I''ve received to date took 34 minutes.  The
latest
> >> clientside
> >> > AFR bonnie run took 80 minutes.  Based on the website, I
would
> >> expect
> >> > to see better performance than drbd/GFS, but, so far that
hasn''t
> >> been
> >> > the case.
> >> >
> >> > Its been suggested that I use unify-rr-afr.  In my current
setup,
> >> it
> >> > seems that to do that, I would need to break my raid set
which is
> >> my
> >> > next step in debugging this.  Rather than use Raid 1 on the
> >> server, I
> >> > would have 2 bricks on each server which would allow the use
of
> >> unify
> >> > and the rr scheduler.
> >> >
> >> > glusterfs-1.4.0qa32 results in
> >> > [Wed Aug 06 02:01:44 2008] [notice] child pid 14025 exit
signal Bus
> >> > error (7)
> >> > [Wed Aug 06 02:01:44 2008] [notice] child pid 14037 exit
signal Bus
> >> > error (7)
> >> >
> >> > when apache (not mod_gluster) tries to serve files off the
> >> glusterfs
> >> > partition.
> >> >
> >> > The main issue I''m having right now is file creation
speed.  I
> >> realize
> >> > that to create a file I need to do two network ops for each
file
> >> > created, but, it seems that something is horribly wrong in my
> >> > configuration from the results in untarring the kernel.
> >> >
> >> > I''ve tried moving the performance translators
around, but, some
> >> don''t
> >> > seem to make much difference on the server side, and the ones
that
> >> > appear to make some difference client side, don''t
seem to help the
> >> > file creation issue.
> >> >
> >> > On a side note, zresearch.com, I emailed through your contact
> >> form and
> >> > haven''t heard back -- please provide a quote for
generating the
> >> > configuration and contact me offlist.
> >> >
> >> > ===/etc/gluster/gluster-server.vol
> >> > volume posix
> >> >     type storage/posix
> >> >     option directory /gfsvol/data
> >> > end-volume
> >> >
> >> > volume plocks
> >> >   type features/posix-locks
> >> >   subvolumes posix
> >> > end-volume
> >> >
> >> > volume writebehind
> >> >   type performance/write-behind
> >> >   option flush-behind off    # default is
''off''
> >> >   subvolumes plocks
> >> > end-volume
> >> >
> >> > volume readahead
> >> >   type performance/read-ahead
> >> >   option page-size 128kB        # 256KB is the default option
> >> >   option page-count 4           # 2 is default option
> >> >   option force-atime-update off # default is off
> >> >   subvolumes writebehind
> >> > end-volume
> >> >
> >> > volume brick
> >> >   type performance/io-threads
> >> >   option thread-count 4  # deault is 1
> >> >   option cache-size 64MB #64MB
> >> >   subvolumes readahead
> >> > end-volume
> >> >
> >> > volume server
> >> >     type protocol/server
> >> >     option transport-type tcp/server
> >> >     subvolumes brick
> >> >     option auth.ip.brick.allow 10.8.1.*,127.0.0.1
> >> > end-volume
> >> >
> >> >
> >> > ===/etc/glusterfs/gluster-client.vol
> >> >
> >> > volume brick1
> >> >     type protocol/client
> >> >     option transport-type tcp/client # for TCP/IP transport
> >> >     option remote-host 10.8.1.9   # IP address of server1
> >> >     option remote-subvolume brick    # name of the remote
volume on
> >> > server1
> >> > end-volume
> >> >
> >> > volume brick2
> >> >     type protocol/client
> >> >     option transport-type tcp/client # for TCP/IP transport
> >> >     option remote-host 10.8.1.10   # IP address of server2
> >> >     option remote-subvolume brick    # name of the remote
volume on
> >> > server2
> >> > end-volume
> >> >
> >> > volume afr
> >> >    type cluster/afr
> >> >    subvolumes brick1 brick2
> >> > end-volume
> >> >
> >> > volume writebehind
> >> >   type performance/write-behind
> >> >   option aggregate-size 0MB
> >> >   option flush-behind off    # default is
''off''
> >> >   subvolumes afr
> >> > end-volume
> >> >
> >> > volume readahead
> >> >   type performance/read-ahead
> >> >   option page-size 128kB        # 256KB is the default option
> >> >   option page-count 4           # 2 is default option
> >> >   option force-atime-update off # default is off
> >> >   subvolumes writebehind
> >> > end-volume
> >> >
> >> > _______________________________________________
> >> > Gluster-users mailing list
> >> > Gluster-users at gluster.org
> >> > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
> >> >
> >> > >
> >>
> >>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
> >
> >
> > !DSPAM:1,489aa3b2286571187917547!
> >
>
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Keith Freedman

2008-Aug-07 18:48 UTC

head link

[Gluster-users] Gluster 1.3.10 Performance Issues

At 08:20 AM 8/7/2008, Chris Davies wrote:>I''m not convinced that this is a network or hardware problem.
it doesn''t sound like it to me either.  what''s the server
stats while
you''re untarring?

Hopefully one of thegluster devs will step in with some thoughts.

> >
> >
> > Hope that wasn''t confusing.
> >
> > At 10:05 PM 8/6/2008, Chris Davies wrote:
> >> A continuation:
> >>
> >> I used XFS & MD raid 1 on the partitions for the initial
tests.
> >> I tested reiser3 and reiser4 with no significant difference
> >> I reraided to MD Raid 0 with XFS and received some improvement
> >>
> >> I NFS mounted the partition and received bonnie++ numbers similar
to
> >> the best clientside AFR numbers I have been able to get, but,
> >> unpacking the kernel using nfsv4/udp took 1 minute 47 seconds
> >> compared
> >> with 12 seconds for the bare drive, 41 seconds for serverside AFR
and
> >> an average of 17 minutes for clientside AFR.
> >>
> >> If I turn off AFR, whether I mount the remote machine over the net
or
> >> use the local server''s brick, tar xjf of a kernel takes
roughly 29
> >> seconds.
> >>
> >> Large files replicate almost at wire speed.  rsync/cp -Rp of a
large
> >> directory takes considerable time.
> >>
> >> Both QA releases I''ve attempted of 1.4.0 have broken
within minutes
> >> using my configurations.  1.4.0qa32 and 1.4.0qa33.  I''ll
turn debug
> >> logs on and post summaries of those.
> >>
> >> On Aug 6, 2008, at 2:48 PM, Chris Davies wrote:
> >>
> >> > OS: Debian Linux/4.1, 64bit build
> >> > Hardware: quad core xeon x3220, 8gb RAM, dual 7200RPM 1000gb
WD
> >> Hard
> >> > Drives, 750gb raid 1 partition set as /gfsvol to be exported,
dual
> >> > gigE, juniper ex3200 switch
> >> >
> >> > Fuse libraries: fuse-2.7.3glfs10
> >> > Gluster: glusterfs-1.3.10
> >> >
> >> > Running bonnie++ on both machines results in almost identical
> >> numbers,
> >> > eth1 is reserved wholly for server to server communications. 
Right
> >> > now, the only load on these machines comes from my testbed.
> >> There are
> >> > four tests that give a reasonable indicator of performance.
> >> >
> >> > * loading a wordpress blog and looking at the line:
> >> > <!-- 24 queries. 0.634 seconds. -->
> >> > * dd if=/dev/zero of=/gfs/test/out bs=1M count=512
> >> > * time tar xjf /gfs/test/linux-2.6.26.1.tar.bz2
> >> > * /usr/sbin/bonnie++ /gfs/test/
> >> >
> >> > On the wordpress test, .3 seconds is typical.  On various
gluster
> >> > configurations I''ve received between .411 seconds
(server side afr
> >> > config below) and 1.2 seconds with some of the example
> >> > configurations.  Currently, my clientside AFR config comes in
at .
> >> 5xx
> >> > seconds rather consistently.
> >> >
> >> > The second test on the clientside AFR results in 536870912
bytes
> >> (537
> >> > MB) copied, 4.65395 s, 115 MB/s
> >> >
> >> > The third test is unpacking a kernel which has ranged from 28
> >> seconds
> >> > using the Serverside AFR to 6+ minutes on some
configurations.
> >> > Currently the clientside AFR config comes in at about 17
minutes.
> >> >
> >> > The fourth test is a run of bonnie++ which varies from 36
minutes
> >> on
> >> > the serverside AFR to the 80 minute run on the clientside AFR
> >> config.
> >> >
> >> > Current test environment is using both servers as clients
&
> >> servers --
> >> > if I can get reasonable performance, the existing machines
will
> >> become
> >> > clients and the servers will be split to their own platform,
so, I
> >> > want to make sure I am using tcp for connections to give as
close
> >> to a
> >> > real world deployment as possible.  This means I cannot run a
> >> client-
> >> > only config.
> >> >
> >> > Baseline Wordpress returns .311-.399 seconds
> >> > Baseline dd 536870912 bytes (537 MB) copied, 0.489522 s, 1.1
GB/s
> >> > Baseline tar xjf of the kernel, real  0m12.164s
> >> > Baseline Config bonnie++ run on the raid 1 partition: (echo
data |
> >> > bon_csv2txt for the text reporting)
> >> >
> >> > c1ws1,16G,
> >> >
66470,97,93198,16,42430,6,60253,86,97153,7,381.3,0,16,7534,37,++++
> >> +,++
> >> > +,5957,23,7320,34,+++++,+++,4667,21
> >> >
> >> > So far, the best performance I could manage was Server Side
AFR
> >> with
> >> > writebehind/readahead on the server, with aggregate-size set
to
> >> 0mb,
> >> > and the client side running writebehind/readahead.  That
resulted
> >> in:
> >> >
> >> > c1ws2,16G,
> >> >
> >> 
>
37636,50,76855,3,17429,2,60376,76,87653,3,158.6,0,16,1741,3,9683,6,2591,3,2030,3,9790,5,2369,3
> >> >
> >> > It was suggested in IRC that clientside AFR would be faster
and
> >> more
> >> > reliable, however, I''ve ended up with the following
as the best
> >> > results from multiple attempts:
> >> >
> >> > c1ws1,16G,
> >> >
> >> 
>
46041,58,76811,2,4603,0,59140,76,86103,3,132.4,0,16,1069,2,4795,2,1308,2,1045,2,5209,2,1246,2
> >> >
> >> > The bonnie++ run from the serverside AFR that resulted in the
best
> >> > results I''ve received to date took 34 minutes.  The
latest
> >> clientside
> >> > AFR bonnie run took 80 minutes.  Based on the website, I
would
> >> expect
> >> > to see better performance than drbd/GFS, but, so far that
hasn''t
> >> been
> >> > the case.
> >> >
> >> > Its been suggested that I use unify-rr-afr.  In my current
setup,
> >> it
> >> > seems that to do that, I would need to break my raid set
which is
> >> my
> >> > next step in debugging this.  Rather than use Raid 1 on the
> >> server, I
> >> > would have 2 bricks on each server which would allow the use
of
> >> unify
> >> > and the rr scheduler.
> >> >
> >> > glusterfs-1.4.0qa32 results in
> >> > [Wed Aug 06 02:01:44 2008] [notice] child pid 14025 exit
signal Bus
> >> > error (7)
> >> > [Wed Aug 06 02:01:44 2008] [notice] child pid 14037 exit
signal Bus
> >> > error (7)
> >> >
> >> > when apache (not mod_gluster) tries to serve files off the
> >> glusterfs
> >> > partition.
> >> >
> >> > The main issue I''m having right now is file creation
speed.  I
> >> realize
> >> > that to create a file I need to do two network ops for each
file
> >> > created, but, it seems that something is horribly wrong in my
> >> > configuration from the results in untarring the kernel.
> >> >
> >> > I''ve tried moving the performance translators
around, but, some
> >> don''t
> >> > seem to make much difference on the server side, and the ones
that
> >> > appear to make some difference client side, don''t
seem to help the
> >> > file creation issue.
> >> >
> >> > On a side note, zresearch.com, I emailed through your contact
> >> form and
> >> > haven''t heard back -- please provide a quote for
generating the
> >> > configuration and contact me offlist.
> >> >
> >> > ===/etc/gluster/gluster-server.vol
> >> > volume posix
> >> >     type storage/posix
> >> >     option directory /gfsvol/data
> >> > end-volume
> >> >
> >> > volume plocks
> >> >   type features/posix-locks
> >> >   subvolumes posix
> >> > end-volume
> >> >
> >> > volume writebehind
> >> >   type performance/write-behind
> >> >   option flush-behind off    # default is
''off''
> >> >   subvolumes plocks
> >> > end-volume
> >> >
> >> > volume readahead
> >> >   type performance/read-ahead
> >> >   option page-size 128kB        # 256KB is the default option
> >> >   option page-count 4           # 2 is default option
> >> >   option force-atime-update off # default is off
> >> >   subvolumes writebehind
> >> > end-volume
> >> >
> >> > volume brick
> >> >   type performance/io-threads
> >> >   option thread-count 4  # deault is 1
> >> >   option cache-size 64MB #64MB
> >> >   subvolumes readahead
> >> > end-volume
> >> >
> >> > volume server
> >> >     type protocol/server
> >> >     option transport-type tcp/server
> >> >     subvolumes brick
> >> >     option auth.ip.brick.allow 10.8.1.*,127.0.0.1
> >> > end-volume
> >> >
> >> >
> >> > ===/etc/glusterfs/gluster-client.vol
> >> >
> >> > volume brick1
> >> >     type protocol/client
> >> >     option transport-type tcp/client # for TCP/IP transport
> >> >     option remote-host 10.8.1.9   # IP address of server1
> >> >     option remote-subvolume brick    # name of the remote
volume on
> >> > server1
> >> > end-volume
> >> >
> >> > volume brick2
> >> >     type protocol/client
> >> >     option transport-type tcp/client # for TCP/IP transport
> >> >     option remote-host 10.8.1.10   # IP address of server2
> >> >     option remote-subvolume brick    # name of the remote
volume on
> >> > server2
> >> > end-volume
> >> >
> >> > volume afr
> >> >    type cluster/afr
> >> >    subvolumes brick1 brick2
> >> > end-volume
> >> >
> >> > volume writebehind
> >> >   type performance/write-behind
> >> >   option aggregate-size 0MB
> >> >   option flush-behind off    # default is
''off''
> >> >   subvolumes afr
> >> > end-volume
> >> >
> >> > volume readahead
> >> >   type performance/read-ahead
> >> >   option page-size 128kB        # 256KB is the default option
> >> >   option page-count 4           # 2 is default option
> >> >   option force-atime-update off # default is off
> >> >   subvolumes writebehind
> >> > end-volume
> >> >
> >> > _______________________________________________
> >> > Gluster-users mailing list
> >> > Gluster-users at gluster.org
> >> > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
> >> >
> >> > >
> >>
> >>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
> >
> >
> > !DSPAM:1,489aa3b2286571187917547!
> >
>
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Keith Freedman

2008-Aug-07 18:48 UTC

head link

[Gluster-users] Gluster 1.3.10 Performance Issues

At 08:20 AM 8/7/2008, Chris Davies wrote:>I''m not convinced that this is a network or hardware problem.
it doesn''t sound like it to me either.  what''s the server
stats while
you''re untarring?

Hopefully one of thegluster devs will step in with some thoughts.

> >
> >
> > Hope that wasn''t confusing.
> >
> > At 10:05 PM 8/6/2008, Chris Davies wrote:
> >> A continuation:
> >>
> >> I used XFS & MD raid 1 on the partitions for the initial
tests.
> >> I tested reiser3 and reiser4 with no significant difference
> >> I reraided to MD Raid 0 with XFS and received some improvement
> >>
> >> I NFS mounted the partition and received bonnie++ numbers similar
to
> >> the best clientside AFR numbers I have been able to get, but,
> >> unpacking the kernel using nfsv4/udp took 1 minute 47 seconds
> >> compared
> >> with 12 seconds for the bare drive, 41 seconds for serverside AFR
and
> >> an average of 17 minutes for clientside AFR.
> >>
> >> If I turn off AFR, whether I mount the remote machine over the net
or
> >> use the local server''s brick, tar xjf of a kernel takes
roughly 29
> >> seconds.
> >>
> >> Large files replicate almost at wire speed.  rsync/cp -Rp of a
large
> >> directory takes considerable time.
> >>
> >> Both QA releases I''ve attempted of 1.4.0 have broken
within minutes
> >> using my configurations.  1.4.0qa32 and 1.4.0qa33.  I''ll
turn debug
> >> logs on and post summaries of those.
> >>
> >> On Aug 6, 2008, at 2:48 PM, Chris Davies wrote:
> >>
> >> > OS: Debian Linux/4.1, 64bit build
> >> > Hardware: quad core xeon x3220, 8gb RAM, dual 7200RPM 1000gb
WD
> >> Hard
> >> > Drives, 750gb raid 1 partition set as /gfsvol to be exported,
dual
> >> > gigE, juniper ex3200 switch
> >> >
> >> > Fuse libraries: fuse-2.7.3glfs10
> >> > Gluster: glusterfs-1.3.10
> >> >
> >> > Running bonnie++ on both machines results in almost identical
> >> numbers,
> >> > eth1 is reserved wholly for server to server communications. 
Right
> >> > now, the only load on these machines comes from my testbed.
> >> There are
> >> > four tests that give a reasonable indicator of performance.
> >> >
> >> > * loading a wordpress blog and looking at the line:
> >> > <!-- 24 queries. 0.634 seconds. -->
> >> > * dd if=/dev/zero of=/gfs/test/out bs=1M count=512
> >> > * time tar xjf /gfs/test/linux-2.6.26.1.tar.bz2
> >> > * /usr/sbin/bonnie++ /gfs/test/
> >> >
> >> > On the wordpress test, .3 seconds is typical.  On various
gluster
> >> > configurations I''ve received between .411 seconds
(server side afr
> >> > config below) and 1.2 seconds with some of the example
> >> > configurations.  Currently, my clientside AFR config comes in
at .
> >> 5xx
> >> > seconds rather consistently.
> >> >
> >> > The second test on the clientside AFR results in 536870912
bytes
> >> (537
> >> > MB) copied, 4.65395 s, 115 MB/s
> >> >
> >> > The third test is unpacking a kernel which has ranged from 28
> >> seconds
> >> > using the Serverside AFR to 6+ minutes on some
configurations.
> >> > Currently the clientside AFR config comes in at about 17
minutes.
> >> >
> >> > The fourth test is a run of bonnie++ which varies from 36
minutes
> >> on
> >> > the serverside AFR to the 80 minute run on the clientside AFR
> >> config.
> >> >
> >> > Current test environment is using both servers as clients
&
> >> servers --
> >> > if I can get reasonable performance, the existing machines
will
> >> become
> >> > clients and the servers will be split to their own platform,
so, I
> >> > want to make sure I am using tcp for connections to give as
close
> >> to a
> >> > real world deployment as possible.  This means I cannot run a
> >> client-
> >> > only config.
> >> >
> >> > Baseline Wordpress returns .311-.399 seconds
> >> > Baseline dd 536870912 bytes (537 MB) copied, 0.489522 s, 1.1
GB/s
> >> > Baseline tar xjf of the kernel, real  0m12.164s
> >> > Baseline Config bonnie++ run on the raid 1 partition: (echo
data |
> >> > bon_csv2txt for the text reporting)
> >> >
> >> > c1ws1,16G,
> >> >
66470,97,93198,16,42430,6,60253,86,97153,7,381.3,0,16,7534,37,++++
> >> +,++
> >> > +,5957,23,7320,34,+++++,+++,4667,21
> >> >
> >> > So far, the best performance I could manage was Server Side
AFR
> >> with
> >> > writebehind/readahead on the server, with aggregate-size set
to
> >> 0mb,
> >> > and the client side running writebehind/readahead.  That
resulted
> >> in:
> >> >
> >> > c1ws2,16G,
> >> >
> >> 
>
37636,50,76855,3,17429,2,60376,76,87653,3,158.6,0,16,1741,3,9683,6,2591,3,2030,3,9790,5,2369,3
> >> >
> >> > It was suggested in IRC that clientside AFR would be faster
and
> >> more
> >> > reliable, however, I''ve ended up with the following
as the best
> >> > results from multiple attempts:
> >> >
> >> > c1ws1,16G,
> >> >
> >> 
>
46041,58,76811,2,4603,0,59140,76,86103,3,132.4,0,16,1069,2,4795,2,1308,2,1045,2,5209,2,1246,2
> >> >
> >> > The bonnie++ run from the serverside AFR that resulted in the
best
> >> > results I''ve received to date took 34 minutes.  The
latest
> >> clientside
> >> > AFR bonnie run took 80 minutes.  Based on the website, I
would
> >> expect
> >> > to see better performance than drbd/GFS, but, so far that
hasn''t
> >> been
> >> > the case.
> >> >
> >> > Its been suggested that I use unify-rr-afr.  In my current
setup,
> >> it
> >> > seems that to do that, I would need to break my raid set
which is
> >> my
> >> > next step in debugging this.  Rather than use Raid 1 on the
> >> server, I
> >> > would have 2 bricks on each server which would allow the use
of
> >> unify
> >> > and the rr scheduler.
> >> >
> >> > glusterfs-1.4.0qa32 results in
> >> > [Wed Aug 06 02:01:44 2008] [notice] child pid 14025 exit
signal Bus
> >> > error (7)
> >> > [Wed Aug 06 02:01:44 2008] [notice] child pid 14037 exit
signal Bus
> >> > error (7)
> >> >
> >> > when apache (not mod_gluster) tries to serve files off the
> >> glusterfs
> >> > partition.
> >> >
> >> > The main issue I''m having right now is file creation
speed.  I
> >> realize
> >> > that to create a file I need to do two network ops for each
file
> >> > created, but, it seems that something is horribly wrong in my
> >> > configuration from the results in untarring the kernel.
> >> >
> >> > I''ve tried moving the performance translators
around, but, some
> >> don''t
> >> > seem to make much difference on the server side, and the ones
that
> >> > appear to make some difference client side, don''t
seem to help the
> >> > file creation issue.
> >> >
> >> > On a side note, zresearch.com, I emailed through your contact
> >> form and
> >> > haven''t heard back -- please provide a quote for
generating the
> >> > configuration and contact me offlist.
> >> >
> >> > ===/etc/gluster/gluster-server.vol
> >> > volume posix
> >> >     type storage/posix
> >> >     option directory /gfsvol/data
> >> > end-volume
> >> >
> >> > volume plocks
> >> >   type features/posix-locks
> >> >   subvolumes posix
> >> > end-volume
> >> >
> >> > volume writebehind
> >> >   type performance/write-behind
> >> >   option flush-behind off    # default is
''off''
> >> >   subvolumes plocks
> >> > end-volume
> >> >
> >> > volume readahead
> >> >   type performance/read-ahead
> >> >   option page-size 128kB        # 256KB is the default option
> >> >   option page-count 4           # 2 is default option
> >> >   option force-atime-update off # default is off
> >> >   subvolumes writebehind
> >> > end-volume
> >> >
> >> > volume brick
> >> >   type performance/io-threads
> >> >   option thread-count 4  # deault is 1
> >> >   option cache-size 64MB #64MB
> >> >   subvolumes readahead
> >> > end-volume
> >> >
> >> > volume server
> >> >     type protocol/server
> >> >     option transport-type tcp/server
> >> >     subvolumes brick
> >> >     option auth.ip.brick.allow 10.8.1.*,127.0.0.1
> >> > end-volume
> >> >
> >> >
> >> > ===/etc/glusterfs/gluster-client.vol
> >> >
> >> > volume brick1
> >> >     type protocol/client
> >> >     option transport-type tcp/client # for TCP/IP transport
> >> >     option remote-host 10.8.1.9   # IP address of server1
> >> >     option remote-subvolume brick    # name of the remote
volume on
> >> > server1
> >> > end-volume
> >> >
> >> > volume brick2
> >> >     type protocol/client
> >> >     option transport-type tcp/client # for TCP/IP transport
> >> >     option remote-host 10.8.1.10   # IP address of server2
> >> >     option remote-subvolume brick    # name of the remote
volume on
> >> > server2
> >> > end-volume
> >> >
> >> > volume afr
> >> >    type cluster/afr
> >> >    subvolumes brick1 brick2
> >> > end-volume
> >> >
> >> > volume writebehind
> >> >   type performance/write-behind
> >> >   option aggregate-size 0MB
> >> >   option flush-behind off    # default is
''off''
> >> >   subvolumes afr
> >> > end-volume
> >> >
> >> > volume readahead
> >> >   type performance/read-ahead
> >> >   option page-size 128kB        # 256KB is the default option
> >> >   option page-count 4           # 2 is default option
> >> >   option force-atime-update off # default is off
> >> >   subvolumes writebehind
> >> > end-volume
> >> >
> >> > _______________________________________________
> >> > Gluster-users mailing list
> >> > Gluster-users at gluster.org
> >> > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
> >> >
> >> > >
> >>
> >>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
> >
> >
> > !DSPAM:1,489aa3b2286571187917547!
> >
>
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Amar S. Tumballi

2008-Aug-07 19:14 UTC

head link

[Gluster-users] Gluster 1.3.10 Performance Issues

Chris,
 Thanks for benchmark numbers. I will check this. Recently I too observed
this type of behavior. Will get back with some inputs and a fix probably.

Regards,
Amar

2008/8/7 Keith Freedman <freedman at freeformit.com>
> At 08:20 AM 8/7/2008, Chris Davies wrote:
> >I'm not convinced that this is a network or hardware problem.
>
> it doesn't sound like it to me either.  what's the server stats
while
> you're untarring?
>
> Hopefully one of thegluster devs will step in with some thoughts.
>
>
> > >
> > >
> > > Hope that wasn't confusing.
> > >
> > > At 10:05 PM 8/6/2008, Chris Davies wrote:
> > >> A continuation:
> > >>
> > >> I used XFS & MD raid 1 on the partitions for the initial
tests.
> > >> I tested reiser3 and reiser4 with no significant difference
> > >> I reraided to MD Raid 0 with XFS and received some
improvement
> > >>
> > >> I NFS mounted the partition and received bonnie++ numbers
similar to
> > >> the best clientside AFR numbers I have been able to get, but,
> > >> unpacking the kernel using nfsv4/udp took 1 minute 47 seconds
> > >> compared
> > >> with 12 seconds for the bare drive, 41 seconds for serverside
AFR and
> > >> an average of 17 minutes for clientside AFR.
> > >>
> > >> If I turn off AFR, whether I mount the remote machine over
the net or
> > >> use the local server's brick, tar xjf of a kernel takes
roughly 29
> > >> seconds.
> > >>
> > >> Large files replicate almost at wire speed.  rsync/cp -Rp of
a large
> > >> directory takes considerable time.
> > >>
> > >> Both QA releases I've attempted of 1.4.0 have broken
within minutes
> > >> using my configurations.  1.4.0qa32 and 1.4.0qa33.  I'll
turn debug
> > >> logs on and post summaries of those.
> > >>
> > >> On Aug 6, 2008, at 2:48 PM, Chris Davies wrote:
> > >>
> > >> > OS: Debian Linux/4.1, 64bit build
> > >> > Hardware: quad core xeon x3220, 8gb RAM, dual 7200RPM
1000gb WD
> > >> Hard
> > >> > Drives, 750gb raid 1 partition set as /gfsvol to be
exported, dual
> > >> > gigE, juniper ex3200 switch
> > >> >
> > >> > Fuse libraries: fuse-2.7.3glfs10
> > >> > Gluster: glusterfs-1.3.10
> > >> >
> > >> > Running bonnie++ on both machines results in almost
identical
> > >> numbers,
> > >> > eth1 is reserved wholly for server to server
communications.  Right
> > >> > now, the only load on these machines comes from my
testbed.
> > >> There are
> > >> > four tests that give a reasonable indicator of
performance.
> > >> >
> > >> > * loading a wordpress blog and looking at the line:
> > >> > <!-- 24 queries. 0.634 seconds. -->
> > >> > * dd if=/dev/zero of=/gfs/test/out bs=1M count=512
> > >> > * time tar xjf /gfs/test/linux-2.6.26.1.tar.bz2
> > >> > * /usr/sbin/bonnie++ /gfs/test/
> > >> >
> > >> > On the wordpress test, .3 seconds is typical.  On
various gluster
> > >> > configurations I've received between .411 seconds
(server side afr
> > >> > config below) and 1.2 seconds with some of the example
> > >> > configurations.  Currently, my clientside AFR config
comes in at .
> > >> 5xx
> > >> > seconds rather consistently.
> > >> >
> > >> > The second test on the clientside AFR results in
536870912 bytes
> > >> (537
> > >> > MB) copied, 4.65395 s, 115 MB/s
> > >> >
> > >> > The third test is unpacking a kernel which has ranged
from 28
> > >> seconds
> > >> > using the Serverside AFR to 6+ minutes on some
configurations.
> > >> > Currently the clientside AFR config comes in at about 17
minutes.
> > >> >
> > >> > The fourth test is a run of bonnie++ which varies from
36 minutes
> > >> on
> > >> > the serverside AFR to the 80 minute run on the
clientside AFR
> > >> config.
> > >> >
> > >> > Current test environment is using both servers as
clients &
> > >> servers --
> > >> > if I can get reasonable performance, the existing
machines will
> > >> become
> > >> > clients and the servers will be split to their own
platform, so, I
> > >> > want to make sure I am using tcp for connections to give
as close
> > >> to a
> > >> > real world deployment as possible.  This means I cannot
run a
> > >> client-
> > >> > only config.
> > >> >
> > >> > Baseline Wordpress returns .311-.399 seconds
> > >> > Baseline dd 536870912 bytes (537 MB) copied, 0.489522 s,
1.1 GB/s
> > >> > Baseline tar xjf of the kernel, real  0m12.164s
> > >> > Baseline Config bonnie++ run on the raid 1 partition:
(echo data |
> > >> > bon_csv2txt for the text reporting)
> > >> >
> > >> > c1ws1,16G,
> > >> >
66470,97,93198,16,42430,6,60253,86,97153,7,381.3,0,16,7534,37,++++
> > >> +,++
> > >> > +,5957,23,7320,34,+++++,+++,4667,21
> > >> >
> > >> > So far, the best performance I could manage was Server
Side AFR
> > >> with
> > >> > writebehind/readahead on the server, with aggregate-size
set to
> > >> 0mb,
> > >> > and the client side running writebehind/readahead.  That
resulted
> > >> in:
> > >> >
> > >> > c1ws2,16G,
> > >> >
> > >>
> >
>
37636,50,76855,3,17429,2,60376,76,87653,3,158.6,0,16,1741,3,9683,6,2591,3,2030,3,9790,5,2369,3
> > >> >
> > >> > It was suggested in IRC that clientside AFR would be
faster and
> > >> more
> > >> > reliable, however, I've ended up with the following
as the best
> > >> > results from multiple attempts:
> > >> >
> > >> > c1ws1,16G,
> > >> >
> > >>
> >
>
46041,58,76811,2,4603,0,59140,76,86103,3,132.4,0,16,1069,2,4795,2,1308,2,1045,2,5209,2,1246,2
> > >> >
> > >> > The bonnie++ run from the serverside AFR that resulted
in the best
> > >> > results I've received to date took 34 minutes.  The
latest
> > >> clientside
> > >> > AFR bonnie run took 80 minutes.  Based on the website, I
would
> > >> expect
> > >> > to see better performance than drbd/GFS, but, so far
that hasn't
> > >> been
> > >> > the case.
> > >> >
> > >> > Its been suggested that I use unify-rr-afr.  In my
current setup,
> > >> it
> > >> > seems that to do that, I would need to break my raid set
which is
> > >> my
> > >> > next step in debugging this.  Rather than use Raid 1 on
the
> > >> server, I
> > >> > would have 2 bricks on each server which would allow the
use of
> > >> unify
> > >> > and the rr scheduler.
> > >> >
> > >> > glusterfs-1.4.0qa32 results in
> > >> > [Wed Aug 06 02:01:44 2008] [notice] child pid 14025 exit
signal Bus
> > >> > error (7)
> > >> > [Wed Aug 06 02:01:44 2008] [notice] child pid 14037 exit
signal Bus
> > >> > error (7)
> > >> >
> > >> > when apache (not mod_gluster) tries to serve files off
the
> > >> glusterfs
> > >> > partition.
> > >> >
> > >> > The main issue I'm having right now is file creation
speed.  I
> > >> realize
> > >> > that to create a file I need to do two network ops for
each file
> > >> > created, but, it seems that something is horribly wrong
in my
> > >> > configuration from the results in untarring the kernel.
> > >> >
> > >> > I've tried moving the performance translators
around, but, some
> > >> don't
> > >> > seem to make much difference on the server side, and the
ones that
> > >> > appear to make some difference client side, don't
seem to help the
> > >> > file creation issue.
> > >> >
> > >> > On a side note, zresearch.com, I emailed through your
contact
> > >> form and
> > >> > haven't heard back -- please provide a quote for
generating the
> > >> > configuration and contact me offlist.
> > >> >
> > >> > ===/etc/gluster/gluster-server.vol
> > >> > volume posix
> > >> >     type storage/posix
> > >> >     option directory /gfsvol/data
> > >> > end-volume
> > >> >
> > >> > volume plocks
> > >> >   type features/posix-locks
> > >> >   subvolumes posix
> > >> > end-volume
> > >> >
> > >> > volume writebehind
> > >> >   type performance/write-behind
> > >> >   option flush-behind off    # default is 'off'
> > >> >   subvolumes plocks
> > >> > end-volume
> > >> >
> > >> > volume readahead
> > >> >   type performance/read-ahead
> > >> >   option page-size 128kB        # 256KB is the default
option
> > >> >   option page-count 4           # 2 is default option
> > >> >   option force-atime-update off # default is off
> > >> >   subvolumes writebehind
> > >> > end-volume
> > >> >
> > >> > volume brick
> > >> >   type performance/io-threads
> > >> >   option thread-count 4  # deault is 1
> > >> >   option cache-size 64MB #64MB
> > >> >   subvolumes readahead
> > >> > end-volume
> > >> >
> > >> > volume server
> > >> >     type protocol/server
> > >> >     option transport-type tcp/server
> > >> >     subvolumes brick
> > >> >     option auth.ip.brick.allow 10.8.1.*,127.0.0.1
> > >> > end-volume
> > >> >
> > >> >
> > >> > ===/etc/glusterfs/gluster-client.vol
> > >> >
> > >> > volume brick1
> > >> >     type protocol/client
> > >> >     option transport-type tcp/client # for TCP/IP
transport
> > >> >     option remote-host 10.8.1.9   # IP address of
server1
> > >> >     option remote-subvolume brick    # name of the
remote volume on
> > >> > server1
> > >> > end-volume
> > >> >
> > >> > volume brick2
> > >> >     type protocol/client
> > >> >     option transport-type tcp/client # for TCP/IP
transport
> > >> >     option remote-host 10.8.1.10   # IP address of
server2
> > >> >     option remote-subvolume brick    # name of the
remote volume on
> > >> > server2
> > >> > end-volume
> > >> >
> > >> > volume afr
> > >> >    type cluster/afr
> > >> >    subvolumes brick1 brick2
> > >> > end-volume
> > >> >
> > >> > volume writebehind
> > >> >   type performance/write-behind
> > >> >   option aggregate-size 0MB
> > >> >   option flush-behind off    # default is 'off'
> > >> >   subvolumes afr
> > >> > end-volume
> > >> >
> > >> > volume readahead
> > >> >   type performance/read-ahead
> > >> >   option page-size 128kB        # 256KB is the default
option
> > >> >   option page-count 4           # 2 is default option
> > >> >   option force-atime-update off # default is off
> > >> >   subvolumes writebehind
> > >> > end-volume
> > >> >
> > >> > _______________________________________________
> > >> > Gluster-users mailing list
> > >> > Gluster-users at gluster.org
> > >> >
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
> > >> >
> > >> > >
> > >>
> > >>
> > >> _______________________________________________
> > >> Gluster-users mailing list
> > >> Gluster-users at gluster.org
> > >> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
> > >
> > >
> > > !DSPAM:1,489aa3b2286571187917547!
> > >
> >
> >
> >_______________________________________________
> >Gluster-users mailing list
> >Gluster-users at gluster.org
> >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>
>

-- 
Amar Tumballi
Gluster/GlusterFS Hacker
[bulde on #gluster/irc.gnu.org]
http://www.zresearch.com - Commoditizing Super Storage!
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20080807/96f3cc11/attachment.html>

Gluster users - Aug 2008 - Gluster 1.3.10 Performance Issues

[Gluster-users] Gluster 1.3.10 Performance Issues

[Gluster-users] Gluster 1.3.10 Performance Issues

[Gluster-users] Gluster 1.3.10 Performance Issues

[Gluster-users] Gluster 1.3.10 Performance Issues

[Gluster-users] Gluster 1.3.10 Performance Issues

[Gluster-users] Gluster 1.3.10 Performance Issues

[Gluster-users] Gluster 1.3.10 Performance Issues

[Gluster-users] Gluster 1.3.10 Performance Issues

[Gluster-users] Gluster 1.3.10 Performance Issues

[Gluster-users] Gluster 1.3.10 Performance Issues

[Gluster-users] Gluster 1.3.10 Performance Issues